Phone Recognition and Language Modeling for Variety Identification

From HLT@INESC-ID

Oscar Koller
Oscar Koller
Oscar Koller
Oscar Koller, was born in Berlin, being fascinated by spoken and written languages. He is currently working at INESC ID’s Spoken Language Laboratory Lisboa on his master thesis "Automatic Speech Recognition and Identification of African Portuguese". To finish his Electrical Engineering studies at Berlin University of Technology, Germany. His research interests are communication centred: language variety recognition, languages without script, languages with pictographic script, sign languages. As languages are better learnt abroad, he spent significant time living in Brazil, Ghana, China and Portugal. He is fluent in German, English, Portuguese and French, has advanced knowledge of Mandarin and basics in American Sign Language.
Addresses: www mail

Date

  • 14:00, Friday, February 19th, 2010
  • Room 336

Speaker

  • Oscar Koller, L2F

Abstract

This talk will introduce the phonotactic approach "Phone Recognition and Language Modeling" (PRLM) for language/variety identification. After a detailed view on this token based method, I will present the use of a specialized Phone Recognizer to differentiate African Portuguese from European Portuguese in a highly accurate way. In contrast to other PRLM based methods, the tokenizer combines distinctive knowledge about the differences between the target varieties. This knowledge is introduced into a MLP phone recognizer by training two varieties’ mono-phonemes as contrasting phoneme-like classes within a single tokenizer. Significant improvements in terms of identification rate and computational cost were achieved compared to conventional single tokenizer PRLM based systems and to the combination of up to five parallel PRLM identifiers.