|Xavier Anguera Miró|
Xavier Anguera Miró (Ing. [MS]. 2001 UPC University, Dr. [PhD] 2006 UPC University, with a thesis titled "Robust Speaker Diarization for Meetings".
|Addresses: www mail|
The goal of speaker diarization is to determine when each participant speaks in a recording. Such information is extensively used in ASR systems (for example VTLN or in speaker adaptation) and for speaker indexing systems. It is a part of the ongoing Rich Transcription (RT) evaluations organized by NIST.
In recent years the increasing interest in speech/video analysis for the meetings environment (NIST's RT05s and RT06s, AMI-DA-, CHIL and IM2 projects) made it necessary to address the possibility of having several microphones recording synchronously. These can be either organized in microphone clusters or spread across the room in unknown locations.
This presentation will cover the basics of what speaker diarization is and the implementation proposed as part of the author’s PhD. The system presented was built while at the International Computer Science Institute (ICSI) for speaker diarization in the meeting environment and is has been used for participation in the NIST RT evaluations since 2005. It is based on a mono channel diarization system originally created for broadcast news diarization, with a preprocessing step based on the delay&sum algorithm that makes use of the multiple channels available for processing.
The later part of the talk will introduce the efforts recently started in the speaker and audio indexing area in Telefónica R&D. Its impulse has mainly come due to the Spanish I3media Cenit project which started this year. Its objectives will be described, as well as the lines of research taken up to this point.