Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping

In the process of violence recognition, accuracy is reduced due to problems related to time axis misalignment and the semantic deviation of multimedia visual auditory information. Therefore, this paper proposes a method for auditory-visual information fusion based on autoencoder mapping. First, a fe...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jiu Lou, Decheng Zuo, Zhan Zhang, Hongwei Liu
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/4ba6c819a8f547abb7858b9008326632
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:4ba6c819a8f547abb7858b9008326632
record_format dspace
spelling oai:doaj.org-article:4ba6c819a8f547abb7858b90083266322021-11-11T15:39:21ZViolence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping10.3390/electronics102126542079-9292https://doaj.org/article/4ba6c819a8f547abb7858b90083266322021-10-01T00:00:00Zhttps://www.mdpi.com/2079-9292/10/21/2654https://doaj.org/toc/2079-9292In the process of violence recognition, accuracy is reduced due to problems related to time axis misalignment and the semantic deviation of multimedia visual auditory information. Therefore, this paper proposes a method for auditory-visual information fusion based on autoencoder mapping. First, a feature extraction model based on the CNN-LSTM framework is established, and multimedia segments are used as whole input to solve the problem of time axis misalignment of visual and auditory information. Then, a shared semantic subspace is constructed based on an autoencoder mapping model and is optimized by semantic correspondence, which solves the problem of audiovisual semantic deviation and realizes the fusion of visual and auditory information on segment level features. Finally, the whole network is used to identify violence. The experimental results show that the method can make good use of the complementarity between modes. Compared with single-mode information, the multimodal method can achieve better results.Jiu LouDecheng ZuoZhan ZhangHongwei LiuMDPI AGarticleviolence recognitionauditory-visual fusionautoencoder mappingshared semantic subspacesCNN-LSTMElectronicsTK7800-8360ENElectronics, Vol 10, Iss 2654, p 2654 (2021)
institution DOAJ
collection DOAJ
language EN
topic violence recognition
auditory-visual fusion
autoencoder mapping
shared semantic subspaces
CNN-LSTM
Electronics
TK7800-8360
spellingShingle violence recognition
auditory-visual fusion
autoencoder mapping
shared semantic subspaces
CNN-LSTM
Electronics
TK7800-8360
Jiu Lou
Decheng Zuo
Zhan Zhang
Hongwei Liu
Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
description In the process of violence recognition, accuracy is reduced due to problems related to time axis misalignment and the semantic deviation of multimedia visual auditory information. Therefore, this paper proposes a method for auditory-visual information fusion based on autoencoder mapping. First, a feature extraction model based on the CNN-LSTM framework is established, and multimedia segments are used as whole input to solve the problem of time axis misalignment of visual and auditory information. Then, a shared semantic subspace is constructed based on an autoencoder mapping model and is optimized by semantic correspondence, which solves the problem of audiovisual semantic deviation and realizes the fusion of visual and auditory information on segment level features. Finally, the whole network is used to identify violence. The experimental results show that the method can make good use of the complementarity between modes. Compared with single-mode information, the multimodal method can achieve better results.
format article
author Jiu Lou
Decheng Zuo
Zhan Zhang
Hongwei Liu
author_facet Jiu Lou
Decheng Zuo
Zhan Zhang
Hongwei Liu
author_sort Jiu Lou
title Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
title_short Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
title_full Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
title_fullStr Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
title_full_unstemmed Violence Recognition Based on Auditory-Visual Fusion of Autoencoder Mapping
title_sort violence recognition based on auditory-visual fusion of autoencoder mapping
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/4ba6c819a8f547abb7858b9008326632
work_keys_str_mv AT jiulou violencerecognitionbasedonauditoryvisualfusionofautoencodermapping
AT dechengzuo violencerecognitionbasedonauditoryvisualfusionofautoencodermapping
AT zhanzhang violencerecognitionbasedonauditoryvisualfusionofautoencodermapping
AT hongweiliu violencerecognitionbasedonauditoryvisualfusionofautoencodermapping
_version_ 1718434665503653888