A recursive expectation-maximization algorithm for speaker tracking and separation

Abstract The problem of blind and online speaker localization and separation using multiple microphones is addressed based on the recursive expectation-maximization (REM) procedure. A two-stage REM-based algorithm is proposed: (1) multi-speaker direction of arrival (DOA) estimation and (2) multi-spe...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Ofer Schwartz, Sharon Gannot
Formato:	article
Lenguaje:	EN
Publicado:	SpringerOpen 2021
Materias:	Array processing Recursive expectation-maximization algorithm DOA estimation LCMV beamforming Acoustics. Sound QC221-246 Electronic computers. Computer science QA75.5-76.95
Acceso en línea:	https://doaj.org/article/4f4a7b3866354213a203652554fc6ef2
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:4f4a7b3866354213a203652554fc6ef2
record_format	dspace
spelling	oai:doaj.org-article:4f4a7b3866354213a203652554fc6ef22021-12-05T12:19:46ZA recursive expectation-maximization algorithm for speaker tracking and separation10.1186/s13636-021-00228-11687-4722https://doaj.org/article/4f4a7b3866354213a203652554fc6ef22021-12-01T00:00:00Zhttps://doi.org/10.1186/s13636-021-00228-1https://doaj.org/toc/1687-4722Abstract The problem of blind and online speaker localization and separation using multiple microphones is addressed based on the recursive expectation-maximization (REM) procedure. A two-stage REM-based algorithm is proposed: (1) multi-speaker direction of arrival (DOA) estimation and (2) multi-speaker relative transfer function (RTF) estimation. The DOA estimation task uses only the time frequency (TF) bins dominated by a single speaker while the entire frequency range is not required to accomplish this task. In contrast, the RTF estimation task requires the entire frequency range in order to estimate the RTF for each frequency bin. Accordingly, a different statistical model is used for the two tasks. The first REM model is applied under the assumption that the speech signal is sparse in the TF domain, and utilizes a mixture of Gaussians (MoG) model to identify the TF bins associated with a single dominant speaker. The corresponding DOAs are estimated using these bins. The second REM model is applied under the assumption that the speakers are concurrently active in all TF bins and consequently applies a multichannel Wiener filter (MCWF) to separate the speakers. As a result of the assumption of the concurrent speakers, a more precise TF map of the speakers’ activity is obtained. The RTFs are estimated using the outputs of the MCWF-beamformer (BF), which are constructed using the DOAs obtained in the previous stage. Next, using the linearly constrained minimum variance (LCMV)-BF that utilizes the estimated RTFs, the speech signals are separated. The algorithm is evaluated using real-life scenarios of two speakers. Evaluation of the mean absolute error (MAE) of the estimated DOAs and the separation capabilities, demonstrates significant improvement w.r.t. a baseline DOA estimation and speaker separation algorithm.Ofer SchwartzSharon GannotSpringerOpenarticleArray processingRecursive expectation-maximization algorithmDOA estimationLCMV beamformingAcoustics. SoundQC221-246Electronic computers. Computer scienceQA75.5-76.95ENEURASIP Journal on Audio, Speech, and Music Processing, Vol 2021, Iss 1, Pp 1-15 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Array processing Recursive expectation-maximization algorithm DOA estimation LCMV beamforming Acoustics. Sound QC221-246 Electronic computers. Computer science QA75.5-76.95
spellingShingle	Array processing Recursive expectation-maximization algorithm DOA estimation LCMV beamforming Acoustics. Sound QC221-246 Electronic computers. Computer science QA75.5-76.95 Ofer Schwartz Sharon Gannot A recursive expectation-maximization algorithm for speaker tracking and separation
description	Abstract The problem of blind and online speaker localization and separation using multiple microphones is addressed based on the recursive expectation-maximization (REM) procedure. A two-stage REM-based algorithm is proposed: (1) multi-speaker direction of arrival (DOA) estimation and (2) multi-speaker relative transfer function (RTF) estimation. The DOA estimation task uses only the time frequency (TF) bins dominated by a single speaker while the entire frequency range is not required to accomplish this task. In contrast, the RTF estimation task requires the entire frequency range in order to estimate the RTF for each frequency bin. Accordingly, a different statistical model is used for the two tasks. The first REM model is applied under the assumption that the speech signal is sparse in the TF domain, and utilizes a mixture of Gaussians (MoG) model to identify the TF bins associated with a single dominant speaker. The corresponding DOAs are estimated using these bins. The second REM model is applied under the assumption that the speakers are concurrently active in all TF bins and consequently applies a multichannel Wiener filter (MCWF) to separate the speakers. As a result of the assumption of the concurrent speakers, a more precise TF map of the speakers’ activity is obtained. The RTFs are estimated using the outputs of the MCWF-beamformer (BF), which are constructed using the DOAs obtained in the previous stage. Next, using the linearly constrained minimum variance (LCMV)-BF that utilizes the estimated RTFs, the speech signals are separated. The algorithm is evaluated using real-life scenarios of two speakers. Evaluation of the mean absolute error (MAE) of the estimated DOAs and the separation capabilities, demonstrates significant improvement w.r.t. a baseline DOA estimation and speaker separation algorithm.
format	article
author	Ofer Schwartz Sharon Gannot
author_facet	Ofer Schwartz Sharon Gannot
author_sort	Ofer Schwartz
title	A recursive expectation-maximization algorithm for speaker tracking and separation
title_short	A recursive expectation-maximization algorithm for speaker tracking and separation
title_full	A recursive expectation-maximization algorithm for speaker tracking and separation
title_fullStr	A recursive expectation-maximization algorithm for speaker tracking and separation
title_full_unstemmed	A recursive expectation-maximization algorithm for speaker tracking and separation
title_sort	recursive expectation-maximization algorithm for speaker tracking and separation
publisher	SpringerOpen
publishDate	2021
url	https://doaj.org/article/4f4a7b3866354213a203652554fc6ef2
work_keys_str_mv	AT oferschwartz arecursiveexpectationmaximizationalgorithmforspeakertrackingandseparation AT sharongannot arecursiveexpectationmaximizationalgorithmforspeakertrackingandseparation AT oferschwartz recursiveexpectationmaximizationalgorithmforspeakertrackingandseparation AT sharongannot recursiveexpectationmaximizationalgorithmforspeakertrackingandseparation
_version_	1718372015209971712

A recursive expectation-maximization algorithm for speaker tracking and separation

Ejemplares similares