Speaker and content identification

From HLT@INESC-ID

Xavier Anguera
Xavier Anguera
Xavier Anguera Miro: Ing. [MS] 2001 by UPC (Barcelona, Spain), [MS] 2001 European Masters in Language and Speech, Dr. [PhD] 2006 UPC University, with a thesis on speaker diarization for multi-microphone meeting recordings. From 2001 to 2003 he worked for Panasonic Speech Technology Lab in Santa Barbara, CA. From 2004 to 2006 he was a visiting researcher at the International Computer Science Institute (ICSI) in Berkeley, CA. Since 2007 he is a research scientist at Telefonica Research in Barcelona. His research interests cover speech processing (both speaker and content-based) and multimodal multimedia processing.
Addresses: www mail

Date

  • 15:00, Friday, April 13th, 2012
  • Room 336

Speaker

  • Xavier Anguera, Telefonica Research

Abstract

In this talk I will cover two of the topics I have been recently working on. On the one hand, with regard to speaker identification, I will introduce the use of binary fingerprints to model the voice of a speaker. Based on the projection of standard acoustic vectors into a special GMM model (representing the speaker acoustic space), high-dimensional binary vectors, which have been proven successful in identifying speakers for speaker verification and diarization tasks, are obtained. On the other hand, I will talk about current developments in pattern matching approaches that allow for the development of content-centric applications when little or no training data is available for a particular language. In particular, I will describe a query-by-example system I presented to Mediaeval 2011 evaluation, which uses a novel feature extraction front-end I further described in a paper at ICASSP 2012.


Note: This seminar will be held in English.