Sponsored by: FCT (CMUP-EPB/TIC/0026/2013)
Start: January 2015
End: December 2015
PI: Isabel Trancoso
The MT4M project develops machine translation systems for content in microblogs, such as Twitter. This domain is characterized by creative use of language, dialectal lexemes, and informal register, which challenge traditional systems. Our earlier work towards this goal explored the fact that parallel data may be found in microblogs, in order to build a normalization model. In our recent work deals with the lexical sparsity that characterizes this domain by proposing character-based word representation models that explore orthographic properties of the language. The advantages of the model go far beyond the machine translation task, generalizing to several other NLP tasks.