A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition
Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challeng...
Guardado en:
Autores principales: | , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/d123ba2b76394f2eb10a3158886effba |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:d123ba2b76394f2eb10a3158886effba |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:d123ba2b76394f2eb10a3158886effba2021-11-11T15:00:09ZA Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition10.3390/app112198972076-3417https://doaj.org/article/d123ba2b76394f2eb10a3158886effba2021-10-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/9897https://doaj.org/toc/2076-3417Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recalls on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.Huiyun ZhangHeming HuangHenry HanMDPI AGarticlespeech emotion recognitionfeature extractionheterogeneous parallel networkspectral featuresprosodic featuresmulti-feature fusionTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 9897, p 9897 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
speech emotion recognition feature extraction heterogeneous parallel network spectral features prosodic features multi-feature fusion Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 |
spellingShingle |
speech emotion recognition feature extraction heterogeneous parallel network spectral features prosodic features multi-feature fusion Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 Huiyun Zhang Heming Huang Henry Han A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition |
description |
Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recalls on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably. |
format |
article |
author |
Huiyun Zhang Heming Huang Henry Han |
author_facet |
Huiyun Zhang Heming Huang Henry Han |
author_sort |
Huiyun Zhang |
title |
A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition |
title_short |
A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition |
title_full |
A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition |
title_fullStr |
A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition |
title_full_unstemmed |
A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition |
title_sort |
novel heterogeneous parallel convolution bi-lstm for speech emotion recognition |
publisher |
MDPI AG |
publishDate |
2021 |
url |
https://doaj.org/article/d123ba2b76394f2eb10a3158886effba |
work_keys_str_mv |
AT huiyunzhang anovelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition AT heminghuang anovelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition AT henryhan anovelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition AT huiyunzhang novelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition AT heminghuang novelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition AT henryhan novelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition |
_version_ |
1718437929948282880 |