Machine Translation for Microblogs: Difference between revisions

From HLT@INESC-ID

No edit summary
Line 29: Line 29:
== Publications ==
== Publications ==


Luís Marujo, José Portêlo, Wang Ling, David Martins de Matos, João Paulo da Silva Neto, Anatole Gershman, Jaime Carbonell, Isabel Trancoso, Bhiksha Raj, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/11108.pdf Privacy-Preserving Multi-Document Summarization], In ACM SIGIR Workshop on Privacy-Preserving Information Retrieval, Santiago, Chile, August 2015
Wang Ling, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/10156.pdf Paraphrasing 4 Microblog Normalization], In 2013 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, ACL, Seattle, Washington, USA, October 2013


Wang Ling, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/10156.pdf Paraphrasing 4 Microblog Normalization], In 2013 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, ACL, Seattle, Washington, USA, October 2013
Wang Ling, Guang Xiang, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/9260.pdf Microblogs as Parallel Corpora], In The 51th Annual Meeting of the Association for Computational Linguistics (ACL), ACL, Sofia, Bulgaria, August 2013

Revision as of 09:13, 14 January 2016

Sponsored by: FCT (CMUP-EPB/TIC/0026/2013)
Start: January 2015
End: December 2015

INESC-ID Team

PI: Isabel Trancoso

UNBABEL TEAM

Carnegie Mellon University Team

Summary

The MT4M project develops machine translation systems for content in microblogs, such as Twitter. This domain is characterized by creative use of language, dialectal lexemes, and informal register, which challenge traditional systems. Our earlier work towards this goal explored the fact that parallel data may be found in microblogs, in order to build a normalization model. In our recent work deals with the lexical sparsity that characterizes this domain by proposing character-based word representation models that explore orthographic properties of the language. The advantages of the model go far beyond the machine translation task, generalizing to several other NLP tasks.

Publications

Wang Ling, Chris Dyer, Alan Black, Isabel Trancoso, Paraphrasing 4 Microblog Normalization, In 2013 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, ACL, Seattle, Washington, USA, October 2013

Wang Ling, Guang Xiang, Chris Dyer, Alan Black, Isabel Trancoso, Microblogs as Parallel Corpora, In The 51th Annual Meeting of the Association for Computational Linguistics (ACL), ACL, Sofia, Bulgaria, August 2013