Blind source separation by multilayer neural network classifiers for spectrogram analysis

This paper describes a novel method for blind source separation using multilayer neural networks when an audio signal has been recorded in a room with reverberation or with moving signal sources. In conventional applications, speech-recognition specialists can identify the signal from a specific spe...

Full description

Saved in:
Bibliographic Details
Main Authors: Toshihiko SHIRAISHI, Tomoki DOURA
Format: article
Language:EN
Published: The Japan Society of Mechanical Engineers 2019
Subjects:
Online Access:https://doaj.org/article/7de2a8f33f734966bd0437d08dd14fa8
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper describes a novel method for blind source separation using multilayer neural networks when an audio signal has been recorded in a room with reverberation or with moving signal sources. In conventional applications, speech-recognition specialists can identify the signal from a specific speaker in a recording of many speakers by analyzing a spectrogram of the recording. The spectrogram is a visual representation of the time series of frequency spectra of a target signal. To use multilayer neural networks for a similar classification task, the proposed method begins by preparing a spectrogram of a mixed signal using the short-time Fourier transform, which is then regarded as a visual object. The spectrogram is then divided into small time-frequency segments and each segment is classified into a class of the corresponding signal source by the multilayer neural networks. After that, an inverse short-time Fourier transform is employed to extract the separated signals. The paper also evaluates the separation performance of this classification algorithm. With the transformation of the blind source separation problem into a classification problem, multilayer neural network classifiers can be used, and they do not require information about the mixing environment, or statistical characteristics of the target signals, or multiple microphones. Simulated tests indicate that the proposed method achieves good separation performance under conditions with reverberation or moving signal sources. The proposed method may be adapted for separating signals from unknown convolutive mixtures and time-varying systems.