Research on LSTM+Attention Model of Infant Cry Classification

According to the different emotional needs of infants, the effective acquisition of frame-level speech features is realized, and the infant speech emotion recognition model based on the improved Long- and Short-Term Memory (LSTM) network is established. The frame-level speech features are used inste...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Tianye Jian, Yizhun Peng, Wanlong Peng, Zhou Yang
Formato: article
Lenguaje:EN
Publicado: Atlantis Press 2021
Materias:
T
Acceso en línea:https://doaj.org/article/79d57d420c7e4b93b6fe9f3e3217f915
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:According to the different emotional needs of infants, the effective acquisition of frame-level speech features is realized, and the infant speech emotion recognition model based on the improved Long- and Short-Term Memory (LSTM) network is established. The frame-level speech features are used instead of the traditional statistical features to preserve the temporal relationships in the original speech, and the traditional forgetting and input gates are transformed into attention gates by introducing an attention mechanism, to improve the performance of speech emotion recognition, the depth attention gate is calculated according to the self-defined depth strategy. The results show that, in Fau Aibo Children’s emotional data corpus and baby crying emotional needs database, compared with the traditional LSTM based model, the recall rate and F1-score of this model are 3.14%, 5.50%, 1.84% and 5.49% higher, respectively, compared with the traditional model based on LSTM and gated recurrent unit, the training time is shorter and the speech emotion recognition rate of baby is higher.