Tackling the Acoustic Front-end for Distant-Talking Automatic Speech Recognition

From HLT@INESC-ID

Walter Kellermann
Walter Kellermann
Walter Kellermann is a Professor for communications at the Chair of Multimedia Communications and Signal Processing of the University of Erlangen-Nuremberg, Germany. He received the Dipl.-Ing. degree in electrical engineering from the University of Erlangen-Nuremberg in 1983, and the Dr.-Ing. degree from the Technical University Darmstadt, Germany, in 1988. From 1989 to 1990, he was a Postdoctoral Member of technical staff at AT&T Bell Laboratories, Murray Hill, NJ. In 1990, he joined Philips Kommunikations Industrie, Nuremberg, Germany. From 1993 to 1999, he was a Professor at the Fachhochschule Regensburg, before he had joined the University of Erlangen-Nuremberg as a Professor and Head of the Audio Research Laboratory in 1999. He authored or coauthored seven book chapters and more than 70 refereed papers in journals and conference proceedings. He served as a Guest Editor to various journals, as an Associate Editor and Guest Editor to IEEE Transactions on Speech and Audio Processing from 2000 to 2004, and presently serves as an Associate Editor to the EURASIP Journal on Signal Processing and EURASIP Journal on Advances in Signal Processing. He was the General Chair of the 5th International Workshop on Microphone Arrays in 2003 and the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics in 2005. His current research interests include speech signal processing, array signal processing, adaptive filtering, and its applications to acoustic human/machine interfaces.
Addresses: www mail

Date

  • 15:30, Tuesday, October 02, 2007
  • Room 336

Speaker

  • Walter Kellermann, Erlangen-Nuremberg University, Distinguished Lecturer of the IEEE Signal Processing Society.

Abstract

With the ever-growing interest in 'natural' hands-free acoustic human/machine interfaces, the need for according distant-talking automatic speech recognition (ASR) systems increases.

Considering interactive TV as a challenging exemplary application scenario, we investigate the structural problems presented by noisy and reverberant multi-source environments with unpredictable interference and acoustic echoes of loudspeaker signals, and discuss current acoustic signal processing techniques to enhance the input to the actual ASR system. Special attention is paid to reverberation, which affects speech recognizers much more than human listeners, and a recently published method incorporating a reverberation model on the feature level of ASR is discussed.