Phone Recognition and Language Modeling for Variety Identification
From HLT@INESC-ID
Oscar Koller |
![]() |
Addresses: www mail |
Date
- 14:00, Friday, January 29th, 2010
- Room 336
Speaker
- Oscar Koller, L2F
Abstract
This talk will introduce the phonotactic approach "Phone Recognition and Language Modeling" (PRLM) for language/variety identification. After a detailed view on this token based method, I will present the use of a specialized Phone Recognizer to differentiate African Portuguese from European Portuguese in a highly accurate way. In contrast to other PRLM based methods, the tokenizer combines distinctive knowledge about the differences between the target varieties. This knowledge is introduced into a MLP phone recognizer by training two varieties’ mono-phonemes as contrasting phoneme-like classes within a single tokenizer. Significant improvements in terms of identification rate and computational cost were achieved compared to conventional single tokenizer PRLM based systems and to the combination of up to five parallel PRLM identifiers.