Speaker and content identification

From HLT@INESC-ID

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Xavier Anguera
Xavier Anguera
Xavier Anguera
Xavier Anguera Miro: Ing. [MS] 2001 by UPC (Barcelona, Spain), [MS] 2001 European Masters in Language and Speech, Dr. [PhD] 2006 UPC University, with a thesis on speaker diarization for multi-microphone meeting recordings. From 2001 to 2003 he worked for Panasonic Speech Technology Lab in Santa Barbara, CA. From 2004 to 2006 he was a visiting researcher at the International Computer Science Institute (ICSI) in Berkeley, CA. Since 2007 he is a research scientist at Telefonica Research in Barcelona. His research interests cover speech processing (both speaker and content-based) and multimodal multimedia processing.
Addresses: www mail

Date

  • 15:00, Friday, April 13th, 2012
  • Room 336

Speaker

  • Xavier Anguera, Telefonica Research

Abstract

In this talk I will cover two of the topics I have been recently working on. On the one hand, with regard to speaker identification, I will introduce the use of binary fingerprints to model the voice of a speaker. Based on the projection of standard acoustic vectors into a special GMM model (representing the speaker acoustic space), high-dimensional binary vectors, which have been proven successful in identifying speakers for speaker verification and diarization tasks, are obtained. On the other hand, I will talk about current developments in pattern matching approaches that allow for the development of content-centric applications when little or no training data is available for a particular language. In particular, I will describe a query-by-example system I presented to Mediaeval 2011 evaluation, which uses a novel feature extraction front-end I further described in a paper at ICASSP 2012.


Note: This seminar will be held in English.