A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challeng...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Huiyun Zhang, Heming Huang, Henry Han
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
T
Acceso en línea:https://doaj.org/article/d123ba2b76394f2eb10a3158886effba
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d123ba2b76394f2eb10a3158886effba
record_format dspace
spelling oai:doaj.org-article:d123ba2b76394f2eb10a3158886effba2021-11-11T15:00:09ZA Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition10.3390/app112198972076-3417https://doaj.org/article/d123ba2b76394f2eb10a3158886effba2021-10-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/9897https://doaj.org/toc/2076-3417Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recalls on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.Huiyun ZhangHeming HuangHenry HanMDPI AGarticlespeech emotion recognitionfeature extractionheterogeneous parallel networkspectral featuresprosodic featuresmulti-feature fusionTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 9897, p 9897 (2021)
institution DOAJ
collection DOAJ
language EN
topic speech emotion recognition
feature extraction
heterogeneous parallel network
spectral features
prosodic features
multi-feature fusion
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
spellingShingle speech emotion recognition
feature extraction
heterogeneous parallel network
spectral features
prosodic features
multi-feature fusion
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
Huiyun Zhang
Heming Huang
Henry Han
A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition
description Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recalls on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.
format article
author Huiyun Zhang
Heming Huang
Henry Han
author_facet Huiyun Zhang
Heming Huang
Henry Han
author_sort Huiyun Zhang
title A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition
title_short A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition
title_full A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition
title_fullStr A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition
title_full_unstemmed A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition
title_sort novel heterogeneous parallel convolution bi-lstm for speech emotion recognition
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/d123ba2b76394f2eb10a3158886effba
work_keys_str_mv AT huiyunzhang anovelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition
AT heminghuang anovelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition
AT henryhan anovelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition
AT huiyunzhang novelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition
AT heminghuang novelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition
AT henryhan novelheterogeneousparallelconvolutionbilstmforspeechemotionrecognition
_version_ 1718437929948282880