Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.

IEEE/ACM Trans Audio Speech Lang Process

Department of Computer Science and Engineering & the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210-1277 USA.

Published: May 2020

This study proposes a complex spectral mapping approach for single- and multi-channel speech enhancement, where deep neural networks (DNNs) are used to predict the real and imaginary (RI) components of the direct-path signal from noisy and reverberant ones. The proposed system contains two DNNs. The first one performs single-channel complex spectral mapping. The estimated complex spectra are used to compute a minimum variance distortion-less response (MVDR) beamformer. The RI components of beamforming results, which encode spatial information, are then combined with the RI components of the mixture to train the second DNN for multi-channel complex spectral mapping. With estimated complex spectra, we also propose a novel method of time-varying beamforming. State-of-the-art performance is obtained on the speech enhancement and recognition tasks of the CHiME-4 corpus. More specifically, our system obtains 6.82%, 3.19% and 2.00% word error rates (WER) respectively on the single-, two-, and six-microphone tasks of CHiME-4, significantly surpassing the current best results of 9.15%, 3.91% and 2.24% WER.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7971156PMC
http://dx.doi.org/10.1109/taslp.2020.2998279DOI Listing

Publication Analysis

Top Keywords

complex spectral
16
spectral mapping
16
speech enhancement
12
single- multi-channel
8
multi-channel speech
8
mapping estimated
8
estimated complex
8
complex spectra
8
tasks chime-4
8
complex
6

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!