Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition
Automatic Speech Recognition systems commonly separate speech enhancement, including localization, beamforming and postfiltering, from acoustic modeling. In this talk, we perform multichannel enhancement jointly with acoustic modeling in a deep neural network framework. Overall, we find that such multichannel neural networks give a relative word error rate improvement of more than 5% compared to a traditional beamforming-based multichannel ASR system and more than 10% compared to a single channel model.
I received my PhD in Electrical Engineering and Computer Science from MIT in 2009. The main focus of my PhD work was in acoustic modeling for noise robust speech recognition. After my PhD, I spent 5 years at the Speech and Language Algorithms group at IBM T.J. Watson Research Center, before joining Google Research. I have co-organized a special session on Sparse Representations at Interspeech 2010 in Japan. I have also organized a special session on Deep Learning at ICML 2013 in Atlanta. In addition, I am a staff reporter for the IEEE Speech and Language Processing Technical Committee (SLTC) Newsletter. My research interests are mainly in acoustic modeling, including deep neural networks, sparse representations and adaptation methods.