Tara Sainath

Chevron down

Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition

Automatic Speech Recognition systems commonly separate speech enhancement, including localization, beamforming and postfiltering, from acoustic modeling. In this talk, we perform multichannel enhancement jointly with acoustic modeling in a deep neural network framework. Overall, we find that such multichannel neural networks give a relative word error rate improvement of more than 5% compared to a traditional beamforming-based multichannel ASR system and more than 10% compared to a single channel model.

I received my PhD in Electrical Engineering and Computer Science from MIT in 2009. The main focus of my PhD work was in acoustic modeling for noise robust speech recognition. After my PhD, I spent 5 years at the Speech and Language Algorithms group at IBM T.J. Watson Research Center, before joining Google Research. I have co-organized a special session on Sparse Representations at Interspeech 2010 in Japan. I have also organized a special session on Deep Learning at ICML 2013 in Atlanta. In addition, I am a staff reporter for the IEEE Speech and Language Processing Technical Committee (SLTC) Newsletter. My research interests are mainly in acoustic modeling, including deep neural networks, sparse representations and adaptation methods.

Buttontwitter Buttonlinkedin

As Featured In

Original
Original
Original
Original
Original
Original

Partners & Attendees

Intel.001
Nvidia.001
Ibm watson health 3.001
Acc1.001
Rbc research.001
Twentybn.001
Mit tech review.001
Kd nuggets.001
Facebook.001
Graphcoreai.001
Maluuba 2017.001
Forbes.001
This website uses cookies to ensure you get the best experience. Learn more