Performance Evaluation of Offline Speech Recognition on Edge Devices

Deep learning–based speech recognition applications have made great strides in the past decade. Deep learning–based systems have evolved to achieve higher accuracy while using simpler end-to-end architectures, compared to their predecessor hybrid architectures. Most of these state-of-the-art systems...

Description complète

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Santosh Gondi, Vineel Pratap
Format:	article
Langue:	EN
Publié:	MDPI AG 2021
Sujets:	ASR speech-to-text edge AI Wav2Vec transformers PyTorch Electronics TK7800-8360
Accès en ligne:	https://doaj.org/article/36a6cae073d040aea6e5ac3a23e7c280
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

id	oai:doaj.org-article:36a6cae073d040aea6e5ac3a23e7c280
record_format	dspace
spelling	oai:doaj.org-article:36a6cae073d040aea6e5ac3a23e7c2802021-11-11T15:41:26ZPerformance Evaluation of Offline Speech Recognition on Edge Devices10.3390/electronics102126972079-9292https://doaj.org/article/36a6cae073d040aea6e5ac3a23e7c2802021-11-01T00:00:00Zhttps://www.mdpi.com/2079-9292/10/21/2697https://doaj.org/toc/2079-9292Deep learning–based speech recognition applications have made great strides in the past decade. Deep learning–based systems have evolved to achieve higher accuracy while using simpler end-to-end architectures, compared to their predecessor hybrid architectures. Most of these state-of-the-art systems run on backend servers with large amounts of memory and CPU/GPU resources. The major disadvantage of server-based speech recognition is the lack of privacy and security for user speech data. Additionally, because of network dependency, this server-based architecture cannot always be reliable, performant and available. Nevertheless, offline speech recognition on client devices overcomes these issues. However, resource constraints on smaller edge devices may pose challenges for achieving state-of-the-art speech recognition results. In this paper, we evaluate the performance and efficiency of transformer-based speech recognition systems on edge devices. We evaluate inference performance on two popular edge devices, Raspberry Pi and Nvidia Jetson Nano, running on CPU and GPU, respectively. We conclude that with PyTorch mobile optimization and quantization, the models can achieve real-time inference on the Raspberry Pi CPU with a small degradation to word error rate. On the Jetson Nano GPU, the inference latency is three to five times better, compared to Raspberry Pi. The word error rate on the edge is still higher, but it is not too far behind, compared to that on the server inference.Santosh GondiVineel PratapMDPI AGarticleASRspeech-to-textedge AIWav2VectransformersPyTorchElectronicsTK7800-8360ENElectronics, Vol 10, Iss 2697, p 2697 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	ASR speech-to-text edge AI Wav2Vec transformers PyTorch Electronics TK7800-8360
spellingShingle	ASR speech-to-text edge AI Wav2Vec transformers PyTorch Electronics TK7800-8360 Santosh Gondi Vineel Pratap Performance Evaluation of Offline Speech Recognition on Edge Devices
description	Deep learning–based speech recognition applications have made great strides in the past decade. Deep learning–based systems have evolved to achieve higher accuracy while using simpler end-to-end architectures, compared to their predecessor hybrid architectures. Most of these state-of-the-art systems run on backend servers with large amounts of memory and CPU/GPU resources. The major disadvantage of server-based speech recognition is the lack of privacy and security for user speech data. Additionally, because of network dependency, this server-based architecture cannot always be reliable, performant and available. Nevertheless, offline speech recognition on client devices overcomes these issues. However, resource constraints on smaller edge devices may pose challenges for achieving state-of-the-art speech recognition results. In this paper, we evaluate the performance and efficiency of transformer-based speech recognition systems on edge devices. We evaluate inference performance on two popular edge devices, Raspberry Pi and Nvidia Jetson Nano, running on CPU and GPU, respectively. We conclude that with PyTorch mobile optimization and quantization, the models can achieve real-time inference on the Raspberry Pi CPU with a small degradation to word error rate. On the Jetson Nano GPU, the inference latency is three to five times better, compared to Raspberry Pi. The word error rate on the edge is still higher, but it is not too far behind, compared to that on the server inference.
format	article
author	Santosh Gondi Vineel Pratap
author_facet	Santosh Gondi Vineel Pratap
author_sort	Santosh Gondi
title	Performance Evaluation of Offline Speech Recognition on Edge Devices
title_short	Performance Evaluation of Offline Speech Recognition on Edge Devices
title_full	Performance Evaluation of Offline Speech Recognition on Edge Devices
title_fullStr	Performance Evaluation of Offline Speech Recognition on Edge Devices
title_full_unstemmed	Performance Evaluation of Offline Speech Recognition on Edge Devices
title_sort	performance evaluation of offline speech recognition on edge devices
publisher	MDPI AG
publishDate	2021
url	https://doaj.org/article/36a6cae073d040aea6e5ac3a23e7c280
work_keys_str_mv	AT santoshgondi performanceevaluationofofflinespeechrecognitiononedgedevices AT vineelpratap performanceevaluationofofflinespeechrecognitiononedgedevices
_version_	1718434207331516416

Performance Evaluation of Offline Speech Recognition on Edge Devices

Documents similaires