Deep bottleneck features for spoken language identification.

A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Bing Jiang, Yan Song, Si Wei, Jun-Hua Liu, Ian Vince McLoughlin, Li-Rong Dai
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2014
Materias:
R
Q
Acceso en línea:https://doaj.org/article/0d393633184b41409e7ad15df36a83c7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:0d393633184b41409e7ad15df36a83c7
record_format dspace
spelling oai:doaj.org-article:0d393633184b41409e7ad15df36a83c72021-11-25T06:10:06ZDeep bottleneck features for spoken language identification.1932-620310.1371/journal.pone.0100795https://doaj.org/article/0d393633184b41409e7ad15df36a83c72014-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24983963/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed.Bing JiangYan SongSi WeiJun-Hua LiuIan Vince McLoughlinLi-Rong DaiPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 9, Iss 7, p e100795 (2014)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Bing Jiang
Yan Song
Si Wei
Jun-Hua Liu
Ian Vince McLoughlin
Li-Rong Dai
Deep bottleneck features for spoken language identification.
description A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed.
format article
author Bing Jiang
Yan Song
Si Wei
Jun-Hua Liu
Ian Vince McLoughlin
Li-Rong Dai
author_facet Bing Jiang
Yan Song
Si Wei
Jun-Hua Liu
Ian Vince McLoughlin
Li-Rong Dai
author_sort Bing Jiang
title Deep bottleneck features for spoken language identification.
title_short Deep bottleneck features for spoken language identification.
title_full Deep bottleneck features for spoken language identification.
title_fullStr Deep bottleneck features for spoken language identification.
title_full_unstemmed Deep bottleneck features for spoken language identification.
title_sort deep bottleneck features for spoken language identification.
publisher Public Library of Science (PLoS)
publishDate 2014
url https://doaj.org/article/0d393633184b41409e7ad15df36a83c7
work_keys_str_mv AT bingjiang deepbottleneckfeaturesforspokenlanguageidentification
AT yansong deepbottleneckfeaturesforspokenlanguageidentification
AT siwei deepbottleneckfeaturesforspokenlanguageidentification
AT junhualiu deepbottleneckfeaturesforspokenlanguageidentification
AT ianvincemcloughlin deepbottleneckfeaturesforspokenlanguageidentification
AT lirongdai deepbottleneckfeaturesforspokenlanguageidentification
_version_ 1718414085648809984