A convolutional recurrent neural network with attention framework for speech separation in monaural recordings

Abstract Most speech separation studies in monaural channel use only a single type of network, and the separation effect is typically not satisfactory, posing difficulties for high quality speech separation. In this study, we propose a convolutional recurrent neural network with an attention (CRNN-A...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Chao Sun, Min Zhang, Ruijuan Wu, Junhong Lu, Guo Xian, Qin Yu, Xiaofeng Gong, Ruisen Luo
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/d0e7c3a33d6b4689a893b61ff127b46f
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d0e7c3a33d6b4689a893b61ff127b46f
record_format dspace
spelling oai:doaj.org-article:d0e7c3a33d6b4689a893b61ff127b46f2021-12-02T14:01:35ZA convolutional recurrent neural network with attention framework for speech separation in monaural recordings10.1038/s41598-020-80713-32045-2322https://doaj.org/article/d0e7c3a33d6b4689a893b61ff127b46f2021-01-01T00:00:00Zhttps://doi.org/10.1038/s41598-020-80713-3https://doaj.org/toc/2045-2322Abstract Most speech separation studies in monaural channel use only a single type of network, and the separation effect is typically not satisfactory, posing difficulties for high quality speech separation. In this study, we propose a convolutional recurrent neural network with an attention (CRNN-A) framework for speech separation, fusing advantages of two networks together. The proposed separation framework uses a convolutional neural network (CNN) as the front-end of a recurrent neural network (RNN), alleviating the problem that a sole RNN cannot effectively learn the necessary features. This framework makes use of the translation invariance provided by CNN to extract information without modifying the original signals. Within the supplemented CNN, two different convolution kernels are designed to capture information in both the time and frequency domains of the input spectrogram. After concatenating the time-domain and the frequency-domain feature maps, the feature information of speech is exploited through consecutive convolutional layers. Finally, the feature map learned from the front-end CNN is combined with the original spectrogram and is sent to the back-end RNN. Further, the attention mechanism is further incorporated, focusing on the relationship among different feature maps. The effectiveness of the proposed method is evaluated on the standard dataset MIR-1K and the results prove that the proposed method outperforms the baseline RNN and other popular speech separation methods, in terms of GNSDR (gloabl normalised source-to-distortion ratio), GSIR (global source-to-interferences ratio), and GSAR (gloabl source-to-artifacts ratio). In summary, the proposed CRNN-A framework can effectively combine the advantages of CNN and RNN, and further optimise the separation performance via the attention mechanism. The proposed framework can shed a new light on speech separation, speech enhancement, and other related fields.Chao SunMin ZhangRuijuan WuJunhong LuGuo XianQin YuXiaofeng GongRuisen LuoNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-14 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Chao Sun
Min Zhang
Ruijuan Wu
Junhong Lu
Guo Xian
Qin Yu
Xiaofeng Gong
Ruisen Luo
A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
description Abstract Most speech separation studies in monaural channel use only a single type of network, and the separation effect is typically not satisfactory, posing difficulties for high quality speech separation. In this study, we propose a convolutional recurrent neural network with an attention (CRNN-A) framework for speech separation, fusing advantages of two networks together. The proposed separation framework uses a convolutional neural network (CNN) as the front-end of a recurrent neural network (RNN), alleviating the problem that a sole RNN cannot effectively learn the necessary features. This framework makes use of the translation invariance provided by CNN to extract information without modifying the original signals. Within the supplemented CNN, two different convolution kernels are designed to capture information in both the time and frequency domains of the input spectrogram. After concatenating the time-domain and the frequency-domain feature maps, the feature information of speech is exploited through consecutive convolutional layers. Finally, the feature map learned from the front-end CNN is combined with the original spectrogram and is sent to the back-end RNN. Further, the attention mechanism is further incorporated, focusing on the relationship among different feature maps. The effectiveness of the proposed method is evaluated on the standard dataset MIR-1K and the results prove that the proposed method outperforms the baseline RNN and other popular speech separation methods, in terms of GNSDR (gloabl normalised source-to-distortion ratio), GSIR (global source-to-interferences ratio), and GSAR (gloabl source-to-artifacts ratio). In summary, the proposed CRNN-A framework can effectively combine the advantages of CNN and RNN, and further optimise the separation performance via the attention mechanism. The proposed framework can shed a new light on speech separation, speech enhancement, and other related fields.
format article
author Chao Sun
Min Zhang
Ruijuan Wu
Junhong Lu
Guo Xian
Qin Yu
Xiaofeng Gong
Ruisen Luo
author_facet Chao Sun
Min Zhang
Ruijuan Wu
Junhong Lu
Guo Xian
Qin Yu
Xiaofeng Gong
Ruisen Luo
author_sort Chao Sun
title A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
title_short A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
title_full A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
title_fullStr A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
title_full_unstemmed A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
title_sort convolutional recurrent neural network with attention framework for speech separation in monaural recordings
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/d0e7c3a33d6b4689a893b61ff127b46f
work_keys_str_mv AT chaosun aconvolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT minzhang aconvolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT ruijuanwu aconvolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT junhonglu aconvolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT guoxian aconvolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT qinyu aconvolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT xiaofenggong aconvolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT ruisenluo aconvolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT chaosun convolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT minzhang convolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT ruijuanwu convolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT junhonglu convolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT guoxian convolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT qinyu convolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT xiaofenggong convolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
AT ruisenluo convolutionalrecurrentneuralnetworkwithattentionframeworkforspeechseparationinmonauralrecordings
_version_ 1718392171266048000