Recent advances in language and speaker recognition: Gaussian Super Vectors and compensation methods

From HLT@INESC-ID

Jordi Luque
Jordi Luque
Jordi Luque received the Electrical Engineering degree from the Technical University of Catalonia (UPC), Barcelona, Spain, in 2005. He is currently working towards the PhD. degree at the Research Center for Language and Speech Technology and Applications (TALP) at the UPC. His research interests are related to the field of speech processing. Specifically, he has worked on the speaker identification and verification problems, diarization of meetings and broadcast news and automatic speech recognition. He is focusing his work on performing speaker diarization and tracking in smart-room environments combining information from other audio and video modalities. And is currently working at the Spoken Language Systems Laboratory (L2F).
Addresses: www mail

Date

  • 15:00, Friday, January 22nd, 2010
  • Room 336

Speakers

  • Alberto Abad
  • Jordi Luque, Research Center for Language and Speech Technology and Applications (TALP), UPC, Spain

Abstract

A considerable amount of promising methods for language and speaker recognition have been proposed in the most recent NIST language (LRE) and speaker (SRE) recognition evaluation workshops.

One of the most widely accepted approaches consists of  combining both Gaussian mixture models (GMM) and Support Vector Machines (SVM). A classical GMM-UBM (Universal Background Model) approach is used to obtain an adapted model for each training utterance. Then, Gaussian means of these adapted models are stacked in a super-vector form to train a SVM for each different target language or client speaker. During identification, super-vectors are extracted from a model adapted to the test utterance and used to obtain a classification with the SVMs previously trained. This approach is generally known as Gaussian Super Vectors (GSV). In addition to the GSV approach, most recent efforts have been devoted to the problem of compensation to different sources of variability such as session, channel, speaker, and so on. Two of the most outstanding compensation methods are the Nuisance Attribute Projection (NAP) and the Joint Factor Analysis (JFA).

In this talk, we are going to explain the conventional GMM-UBM approach and how it is related to the GSV method. A detailed explanation of the GSV method and some variations of it will be presented. We will also introduce some of the most recent compensation methods mentioned above. Experimental results on LRE and SRE corpora will be enclosed to better characterize the techniques presented.