A Hybrid Speech Enhancement Algorithm for Voice Assistance Application

In recent years, speech recognition technology has become a more common notion. Speech quality and intelligibility are critical for the convenience and accuracy of information transmission in speech recognition. The speech processing systems used to converse or store speech are usually designed for...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Jenifa Gnanamanickam, Yuvaraj Natarajan, Sri Preethaa K. R.
Formato:	article
Lenguaje:	EN
Publicado:	MDPI AG 2021
Materias:	speech recognition speech enhancement speech to text word error rate Chemical technology TP1-1185
Acceso en línea:	https://doaj.org/article/d1931a1186274d9eb12568de93b274d3
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:d1931a1186274d9eb12568de93b274d3
record_format	dspace
spelling	oai:doaj.org-article:d1931a1186274d9eb12568de93b274d32021-11-11T19:03:35ZA Hybrid Speech Enhancement Algorithm for Voice Assistance Application10.3390/s212170251424-8220https://doaj.org/article/d1931a1186274d9eb12568de93b274d32021-10-01T00:00:00Zhttps://www.mdpi.com/1424-8220/21/21/7025https://doaj.org/toc/1424-8220In recent years, speech recognition technology has become a more common notion. Speech quality and intelligibility are critical for the convenience and accuracy of information transmission in speech recognition. The speech processing systems used to converse or store speech are usually designed for an environment without any background noise. However, in a real-world atmosphere, background intervention in the form of background noise and channel noise drastically reduces the performance of speech recognition systems, resulting in imprecise information transfer and exhausting the listener. When communication systems’ input or output signals are affected by noise, speech enhancement techniques try to improve their performance. To ensure the correctness of the text produced from speech, it is necessary to reduce the external noises involved in the speech audio. Reducing the external noise in audio is difficult as the speech can be of single, continuous or spontaneous words. In automatic speech recognition, there are various typical speech enhancement algorithms available that have gained considerable attention. However, these enhancement algorithms work well in simple and continuous audio signals only. Thus, in this study, a hybridized speech recognition algorithm to enhance the speech recognition accuracy is proposed. Non-linear spectral subtraction, a well-known speech enhancement algorithm, is optimized with the Hidden Markov Model and tested with 6660 medical speech transcription audio files and 1440 Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio files. The performance of the proposed model is compared with those of various typical speech enhancement algorithms, such as iterative signal enhancement algorithm, subspace-based speech enhancement, and non-linear spectral subtraction. The proposed cascaded hybrid algorithm was found to achieve a minimum word error rate of 9.5% and 7.6% for medical speech and RAVDESS speech, respectively. The cascading of the speech enhancement and speech-to-text conversion architectures results in higher accuracy for enhanced speech recognition. The evaluation results confirm the incorporation of the proposed method with real-time automatic speech recognition medical applications where the complexity of terms involved is high.Jenifa GnanamanickamYuvaraj NatarajanSri Preethaa K. R.MDPI AGarticlespeech recognitionspeech enhancementspeech to textword error rateChemical technologyTP1-1185ENSensors, Vol 21, Iss 7025, p 7025 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	speech recognition speech enhancement speech to text word error rate Chemical technology TP1-1185
spellingShingle	speech recognition speech enhancement speech to text word error rate Chemical technology TP1-1185 Jenifa Gnanamanickam Yuvaraj Natarajan Sri Preethaa K. R. A Hybrid Speech Enhancement Algorithm for Voice Assistance Application
description	In recent years, speech recognition technology has become a more common notion. Speech quality and intelligibility are critical for the convenience and accuracy of information transmission in speech recognition. The speech processing systems used to converse or store speech are usually designed for an environment without any background noise. However, in a real-world atmosphere, background intervention in the form of background noise and channel noise drastically reduces the performance of speech recognition systems, resulting in imprecise information transfer and exhausting the listener. When communication systems’ input or output signals are affected by noise, speech enhancement techniques try to improve their performance. To ensure the correctness of the text produced from speech, it is necessary to reduce the external noises involved in the speech audio. Reducing the external noise in audio is difficult as the speech can be of single, continuous or spontaneous words. In automatic speech recognition, there are various typical speech enhancement algorithms available that have gained considerable attention. However, these enhancement algorithms work well in simple and continuous audio signals only. Thus, in this study, a hybridized speech recognition algorithm to enhance the speech recognition accuracy is proposed. Non-linear spectral subtraction, a well-known speech enhancement algorithm, is optimized with the Hidden Markov Model and tested with 6660 medical speech transcription audio files and 1440 Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio files. The performance of the proposed model is compared with those of various typical speech enhancement algorithms, such as iterative signal enhancement algorithm, subspace-based speech enhancement, and non-linear spectral subtraction. The proposed cascaded hybrid algorithm was found to achieve a minimum word error rate of 9.5% and 7.6% for medical speech and RAVDESS speech, respectively. The cascading of the speech enhancement and speech-to-text conversion architectures results in higher accuracy for enhanced speech recognition. The evaluation results confirm the incorporation of the proposed method with real-time automatic speech recognition medical applications where the complexity of terms involved is high.
format	article
author	Jenifa Gnanamanickam Yuvaraj Natarajan Sri Preethaa K. R.
author_facet	Jenifa Gnanamanickam Yuvaraj Natarajan Sri Preethaa K. R.
author_sort	Jenifa Gnanamanickam
title	A Hybrid Speech Enhancement Algorithm for Voice Assistance Application
title_short	A Hybrid Speech Enhancement Algorithm for Voice Assistance Application
title_full	A Hybrid Speech Enhancement Algorithm for Voice Assistance Application
title_fullStr	A Hybrid Speech Enhancement Algorithm for Voice Assistance Application
title_full_unstemmed	A Hybrid Speech Enhancement Algorithm for Voice Assistance Application
title_sort	hybrid speech enhancement algorithm for voice assistance application
publisher	MDPI AG
publishDate	2021
url	https://doaj.org/article/d1931a1186274d9eb12568de93b274d3
work_keys_str_mv	AT jenifagnanamanickam ahybridspeechenhancementalgorithmforvoiceassistanceapplication AT yuvarajnatarajan ahybridspeechenhancementalgorithmforvoiceassistanceapplication AT sripreethaakr ahybridspeechenhancementalgorithmforvoiceassistanceapplication AT jenifagnanamanickam hybridspeechenhancementalgorithmforvoiceassistanceapplication AT yuvarajnatarajan hybridspeechenhancementalgorithmforvoiceassistanceapplication AT sripreethaakr hybridspeechenhancementalgorithmforvoiceassistanceapplication
_version_	1718431672010014720

A Hybrid Speech Enhancement Algorithm for Voice Assistance Application

Ejemplares similares