Effect on speech emotion classification of a feature selection approach using a convolutional neural network

Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ammar Amjad, Lal Khan, Hsien-Tsung Chang
Formato: article
Lenguaje:EN
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://doaj.org/article/bbf3afa8d45b4e9eb37979e8193d2f28
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:bbf3afa8d45b4e9eb37979e8193d2f28
record_format dspace
spelling oai:doaj.org-article:bbf3afa8d45b4e9eb37979e8193d2f282021-11-05T15:05:33ZEffect on speech emotion classification of a feature selection approach using a convolutional neural network10.7717/peerj-cs.7662376-5992https://doaj.org/article/bbf3afa8d45b4e9eb37979e8193d2f282021-11-01T00:00:00Zhttps://peerj.com/articles/cs-766.pdfhttps://peerj.com/articles/cs-766/https://doaj.org/toc/2376-5992Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.Ammar AmjadLal KhanHsien-Tsung ChangPeerJ Inc.articleSpeech emotion recognitionFeature extractionFeature selectionConvolutional neural networkMel-spectrogramData augmentationElectronic computers. Computer scienceQA75.5-76.95ENPeerJ Computer Science, Vol 7, p e766 (2021)
institution DOAJ
collection DOAJ
language EN
topic Speech emotion recognition
Feature extraction
Feature selection
Convolutional neural network
Mel-spectrogram
Data augmentation
Electronic computers. Computer science
QA75.5-76.95
spellingShingle Speech emotion recognition
Feature extraction
Feature selection
Convolutional neural network
Mel-spectrogram
Data augmentation
Electronic computers. Computer science
QA75.5-76.95
Ammar Amjad
Lal Khan
Hsien-Tsung Chang
Effect on speech emotion classification of a feature selection approach using a convolutional neural network
description Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.
format article
author Ammar Amjad
Lal Khan
Hsien-Tsung Chang
author_facet Ammar Amjad
Lal Khan
Hsien-Tsung Chang
author_sort Ammar Amjad
title Effect on speech emotion classification of a feature selection approach using a convolutional neural network
title_short Effect on speech emotion classification of a feature selection approach using a convolutional neural network
title_full Effect on speech emotion classification of a feature selection approach using a convolutional neural network
title_fullStr Effect on speech emotion classification of a feature selection approach using a convolutional neural network
title_full_unstemmed Effect on speech emotion classification of a feature selection approach using a convolutional neural network
title_sort effect on speech emotion classification of a feature selection approach using a convolutional neural network
publisher PeerJ Inc.
publishDate 2021
url https://doaj.org/article/bbf3afa8d45b4e9eb37979e8193d2f28
work_keys_str_mv AT ammaramjad effectonspeechemotionclassificationofafeatureselectionapproachusingaconvolutionalneuralnetwork
AT lalkhan effectonspeechemotionclassificationofafeatureselectionapproachusingaconvolutionalneuralnetwork
AT hsientsungchang effectonspeechemotionclassificationofafeatureselectionapproachusingaconvolutionalneuralnetwork
_version_ 1718444187581415424