|Addresses: www mail|
A considerable amount of promising methods for language and speaker recognition have been proposed in the most recent NIST language (LRE) and speaker (SRE) recognition evaluation workshops.
One of the most widely accepted approaches consists of combining both Gaussian mixture models (GMM) and Support Vector Machines (SVM). A classical GMM-UBM (Universal Background Model) approach is used to obtain an adapted model for each training utterance. Then, Gaussian means of these adapted models are stacked in a super-vector form to train a SVM for each different target language or client speaker. During identification, super-vectors are extracted from a model adapted to the test utterance and used to obtain a classification with the SVMs previously trained. This approach is generally known as Gaussian Super Vectors (GSV). In addition to the GSV approach, most recent efforts have been devoted to the problem of compensation to different sources of variability such as session, channel, speaker, and so on. Two of the most outstanding compensation methods are the Nuisance Attribute Projection (NAP) and the Joint Factor Analysis (JFA).
In this talk, we are going to explain the conventional GMM-UBM approach and how it is related to the GSV method. A detailed explanation of the GSV method and some variations of it will be presented. We will also introduce some of the most recent compensation methods mentioned above. Experimental results on LRE and SRE corpora will be enclosed to better characterize the techniques presented.