Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping

In the process of violence recognition, accuracy is reduced due to problems related to time axis misalignment and the semantic deviation of multimedia visual auditory information. Therefore, this paper proposes a method for auditory-visual information fusion based on autoencoder mapping. First, a fe...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Jiu Lou, Decheng Zuo, Zhan Zhang, Hongwei Liu
Formato:	article
Lenguaje:	EN
Publicado:	MDPI AG 2021
Materias:	violence recognition auditory-visual fusion autoencoder mapping shared semantic subspaces CNN-LSTM Electronics TK7800-8360
Acceso en línea:	https://doaj.org/article/4ba6c819a8f547abb7858b9008326632
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:4ba6c819a8f547abb7858b9008326632
record_format	dspace
spelling	oai:doaj.org-article:4ba6c819a8f547abb7858b90083266322021-11-11T15:39:21ZViolence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping10.3390/electronics102126542079-9292https://doaj.org/article/4ba6c819a8f547abb7858b90083266322021-10-01T00:00:00Zhttps://www.mdpi.com/2079-9292/10/21/2654https://doaj.org/toc/2079-9292In the process of violence recognition, accuracy is reduced due to problems related to time axis misalignment and the semantic deviation of multimedia visual auditory information. Therefore, this paper proposes a method for auditory-visual information fusion based on autoencoder mapping. First, a feature extraction model based on the CNN-LSTM framework is established, and multimedia segments are used as whole input to solve the problem of time axis misalignment of visual and auditory information. Then, a shared semantic subspace is constructed based on an autoencoder mapping model and is optimized by semantic correspondence, which solves the problem of audiovisual semantic deviation and realizes the fusion of visual and auditory information on segment level features. Finally, the whole network is used to identify violence. The experimental results show that the method can make good use of the complementarity between modes. Compared with single-mode information, the multimodal method can achieve better results.Jiu LouDecheng ZuoZhan ZhangHongwei LiuMDPI AGarticleviolence recognitionauditory-visual fusionautoencoder mappingshared semantic subspacesCNN-LSTMElectronicsTK7800-8360ENElectronics, Vol 10, Iss 2654, p 2654 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	violence recognition auditory-visual fusion autoencoder mapping shared semantic subspaces CNN-LSTM Electronics TK7800-8360
spellingShingle	violence recognition auditory-visual fusion autoencoder mapping shared semantic subspaces CNN-LSTM Electronics TK7800-8360 Jiu Lou Decheng Zuo Zhan Zhang Hongwei Liu Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
description	In the process of violence recognition, accuracy is reduced due to problems related to time axis misalignment and the semantic deviation of multimedia visual auditory information. Therefore, this paper proposes a method for auditory-visual information fusion based on autoencoder mapping. First, a feature extraction model based on the CNN-LSTM framework is established, and multimedia segments are used as whole input to solve the problem of time axis misalignment of visual and auditory information. Then, a shared semantic subspace is constructed based on an autoencoder mapping model and is optimized by semantic correspondence, which solves the problem of audiovisual semantic deviation and realizes the fusion of visual and auditory information on segment level features. Finally, the whole network is used to identify violence. The experimental results show that the method can make good use of the complementarity between modes. Compared with single-mode information, the multimodal method can achieve better results.
format	article
author	Jiu Lou Decheng Zuo Zhan Zhang Hongwei Liu
author_facet	Jiu Lou Decheng Zuo Zhan Zhang Hongwei Liu
author_sort	Jiu Lou
title	Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
title_short	Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
title_full	Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
title_fullStr	Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
title_full_unstemmed	Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
title_sort	violence recognition based on auditory-visual fusion of autoencoder mapping
publisher	MDPI AG
publishDate	2021
url	https://doaj.org/article/4ba6c819a8f547abb7858b9008326632
work_keys_str_mv	AT jiulou violencerecognitionbasedonauditoryvisualfusionofautoencodermapping AT dechengzuo violencerecognitionbasedonauditoryvisualfusionofautoencodermapping AT zhanzhang violencerecognitionbasedonauditoryvisualfusionofautoencodermapping AT hongweiliu violencerecognitionbasedonauditoryvisualfusionofautoencodermapping
_version_	1718434665503653888

Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping

Ejemplares similares