Environmental sound classification using temporal-frequency attention based convolutional neural network

Abstract Environmental sound classification is one of the important issues in the audio recognition field. Compared with structured sounds such as speech and music, the time–frequency structure of environmental sounds is more complicated. In order to learn time and frequency features from Log-Mel sp...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Wenjie Mu, Bo Yin, Xianqing Huang, Jiali Xu, Zehua Du
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/9ccc492d96474e4386fdec45cd0445a2
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:9ccc492d96474e4386fdec45cd0445a2
record_format dspace
spelling oai:doaj.org-article:9ccc492d96474e4386fdec45cd0445a22021-11-08T10:50:28ZEnvironmental sound classification using temporal-frequency attention based convolutional neural network10.1038/s41598-021-01045-42045-2322https://doaj.org/article/9ccc492d96474e4386fdec45cd0445a22021-11-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-01045-4https://doaj.org/toc/2045-2322Abstract Environmental sound classification is one of the important issues in the audio recognition field. Compared with structured sounds such as speech and music, the time–frequency structure of environmental sounds is more complicated. In order to learn time and frequency features from Log-Mel spectrogram more effectively, a temporal-frequency attention based convolutional neural network model (TFCNN) is proposed in this paper. Firstly, an experiment that is used as motivation in proposed method is designed to verify the effect of a specific frequency band in the spectrogram on model classification. Secondly, two new attention mechanisms, temporal attention mechanism and frequency attention mechanism, are proposed. These mechanisms can focus on key frequency bands and semantic related time frames on the spectrogram to reduce the influence of background noise and irrelevant frequency bands. Then, a feature information complementarity is formed by combining these mechanisms to more accurately capture the critical time–frequency features. In such a way, the representation ability of the network model can be greatly improved. Finally, experiments on two public data sets, UrbanSound 8 K and ESC-50, demonstrate the effectiveness of the proposed method.Wenjie MuBo YinXianqing HuangJiali XuZehua DuNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-14 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Wenjie Mu
Bo Yin
Xianqing Huang
Jiali Xu
Zehua Du
Environmental sound classification using temporal-frequency attention based convolutional neural network
description Abstract Environmental sound classification is one of the important issues in the audio recognition field. Compared with structured sounds such as speech and music, the time–frequency structure of environmental sounds is more complicated. In order to learn time and frequency features from Log-Mel spectrogram more effectively, a temporal-frequency attention based convolutional neural network model (TFCNN) is proposed in this paper. Firstly, an experiment that is used as motivation in proposed method is designed to verify the effect of a specific frequency band in the spectrogram on model classification. Secondly, two new attention mechanisms, temporal attention mechanism and frequency attention mechanism, are proposed. These mechanisms can focus on key frequency bands and semantic related time frames on the spectrogram to reduce the influence of background noise and irrelevant frequency bands. Then, a feature information complementarity is formed by combining these mechanisms to more accurately capture the critical time–frequency features. In such a way, the representation ability of the network model can be greatly improved. Finally, experiments on two public data sets, UrbanSound 8 K and ESC-50, demonstrate the effectiveness of the proposed method.
format article
author Wenjie Mu
Bo Yin
Xianqing Huang
Jiali Xu
Zehua Du
author_facet Wenjie Mu
Bo Yin
Xianqing Huang
Jiali Xu
Zehua Du
author_sort Wenjie Mu
title Environmental sound classification using temporal-frequency attention based convolutional neural network
title_short Environmental sound classification using temporal-frequency attention based convolutional neural network
title_full Environmental sound classification using temporal-frequency attention based convolutional neural network
title_fullStr Environmental sound classification using temporal-frequency attention based convolutional neural network
title_full_unstemmed Environmental sound classification using temporal-frequency attention based convolutional neural network
title_sort environmental sound classification using temporal-frequency attention based convolutional neural network
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/9ccc492d96474e4386fdec45cd0445a2
work_keys_str_mv AT wenjiemu environmentalsoundclassificationusingtemporalfrequencyattentionbasedconvolutionalneuralnetwork
AT boyin environmentalsoundclassificationusingtemporalfrequencyattentionbasedconvolutionalneuralnetwork
AT xianqinghuang environmentalsoundclassificationusingtemporalfrequencyattentionbasedconvolutionalneuralnetwork
AT jialixu environmentalsoundclassificationusingtemporalfrequencyattentionbasedconvolutionalneuralnetwork
AT zehuadu environmentalsoundclassificationusingtemporalfrequencyattentionbasedconvolutionalneuralnetwork
_version_ 1718442645290745857