Combination of DNNs, Recurrent LSTM Neural Nets & Hidden-Markov-Models for Robust Speech Recognition
In this presentation, we report about our successful approach presented at recent scientific challenges on robust speech recognition that is based on the combination of several important key technologies in the area of machine learning and pattern recognition. One of these components is the use of deep neural networks (DNNs) which is a popular technique nowadays. It is still less known that recurrent neural networks based on the LSTM architecture with only 2-3 hidden layers can achieve a similar performance as DNNs, and therefore we employed this as the second major component in our system. The ideal combination approach is based on Hidden-Markov-Models that process the output activations of both networks in different ways so that they can complement each other and lead to an overall superior performance.
Gerhard Rigoll obtained the Dipl.-Ing degree from Stuttgart University in 1982. He joined Fraunhofer-Institute (IAO) in Stuttgart and received the Dr.-Ing. degree in 1986. From 1986 to 1988 he worked as postdoctoral fellow at IBM T.J. Watson Research Centre in Yorktown Heights/USA. He received the Dr.-Ing. habil. degree in 1991 from Stuttgart University and then joined the NTT Human Interface Laboratories in Tokyo from 1991 to 1993 as a visiting researcher in the framework of the EU Scientific Training Programme in Japan. In 1993 he was appointed full professor of computer science at Mercator-University in Duisburg. In 2002, he joined Technical University of Munich, where he is heading the Institute for Human-Machine Communication with major research activities in Pattern Recognition and Machine Learning, where he has published more than 520 papers.