A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challeng...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Huiyun Zhang, Heming Huang, Henry Han
Formato:	article
Lenguaje:	EN
Publicado:	MDPI AG 2021
Materias:	speech emotion recognition feature extraction heterogeneous parallel network spectral features prosodic features multi-feature fusion Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999
Acceso en línea:	https://doaj.org/article/d123ba2b76394f2eb10a3158886effba
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Descripción
Sumario:	Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recalls on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.

A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition

Ejemplares similares