Recent advances in language and speaker recognition: Gaussian Super Vectors and compensation methods: Difference between revisions
From HLT@INESC-ID
No edit summary |
No edit summary |
||
Line 27: | Line 27: | ||
[[category:Seminars]] | [[category:Seminars]] | ||
[[category:Seminars 2010]] | [[category:Seminars 2010]] | ||
Latest revision as of 11:49, 27 January 2010
Jordi Luque |
![]() |
Addresses: www mail |
Date
- 15:00, Friday, January 22nd, 2010
- Room 336
Speakers
- Alberto Abad
- Jordi Luque, Research Center for Language and Speech Technology and Applications (TALP), UPC, Spain
Abstract
A considerable amount of promising methods for language and speaker recognition have been proposed in the most recent NIST language (LRE) and speaker (SRE) recognition evaluation workshops.
One of the most widely accepted approaches consists of combining both Gaussian mixture models (GMM) and Support Vector Machines (SVM). A classical GMM-UBM (Universal Background Model) approach is used to obtain an adapted model for each training utterance. Then, Gaussian means of these adapted models are stacked in a super-vector form to train a SVM for each different target language or client speaker. During identification, super-vectors are extracted from a model adapted to the test utterance and used to obtain a classification with the SVMs previously trained. This approach is generally known as Gaussian Super Vectors (GSV). In addition to the GSV approach, most recent efforts have been devoted to the problem of compensation to different sources of variability such as session, channel, speaker, and so on. Two of the most outstanding compensation methods are the Nuisance Attribute Projection (NAP) and the Joint Factor Analysis (JFA).
In this talk, we are going to explain the conventional GMM-UBM approach and how it is related to the GSV method. A detailed explanation of the GSV method and some variations of it will be presented. We will also introduce some of the most recent compensation methods mentioned above. Experimental results on LRE and SRE corpora will be enclosed to better characterize the techniques presented.