|Addresses: www mail|
A considerable amount of promising methods for language and speaker recognition have been proposed in the most recent NIST language (LRE) and speaker (SRE) recognition evaluation workshops. In this talk we will focus on the problem of compensation to several sources of variability such as speaker or session and we will introduce the Joint Factor Analysis (JFA) modeling. We will give an explanation of the JFA model and a brief account of the algorithms needed to carry out a JFA of speakers and session variability in a training set in which each speaker is recorded over many different channels.
JFA is a model of speaker and session variability in Gaussian mixture models (GMM's) and it is capable of performing at least as well as fusions of multiple systems of other types. The JFA technique makes use of the super-vector form for modeling. That assumes that a speaker- and channel-dependent supervector (M) can be decomposed into a sum of two supervectors statistically independent, a speaker supervector (s) and a channel supervector (c). In addittion, JFA assumes that all speaker dependent supervectors are contained in the affine space defined by the eigenvoices, the directions of speaker variability, which generate the "speaker space". On another front, the channel variability is confined in the "channel space" defined by the eigenchannels.