Machine Translation for Microblogs: Difference between revisions
From HLT@INESC-ID
No edit summary |
No edit summary |
||
Line 24: | Line 24: | ||
* [[Chris Dyer]] | * [[Chris Dyer]] | ||
* [[Alan Black]] | * [[Alan Black]] | ||
== Summary == | |||
The MT4M project develops machine translation systems for content in microblogs, such as Twitter. This domain is characterized by creative use of language, dialectal lexemes, and informal register, which challenge traditional systems. For example, Google's English-Portuguese translation system translates the English sentence "ill cook it brotha!" (an informal variant of "I'll cook it, brother!" which the same translation system effectively translates) into the completely unintelligible "doente cozinhar brotha!" (roughly: "sick to cook brotha!"). The work on this project involves the development of a tweet normalizer that is capable of converting non-standard text into a standard text while preserving the meaning of the original tweet. |
Revision as of 13:26, 13 January 2016
![]() |
Sponsored by: FCT (CMUP-EPB/TIC/0026/2013)
Start: January 2015
End: December 2015
INESC-ID Team
PI: Isabel Trancoso
UNBABEL TEAM
Carnegie Mellon University Team
Summary
The MT4M project develops machine translation systems for content in microblogs, such as Twitter. This domain is characterized by creative use of language, dialectal lexemes, and informal register, which challenge traditional systems. For example, Google's English-Portuguese translation system translates the English sentence "ill cook it brotha!" (an informal variant of "I'll cook it, brother!" which the same translation system effectively translates) into the completely unintelligible "doente cozinhar brotha!" (roughly: "sick to cook brotha!"). The work on this project involves the development of a tweet normalizer that is capable of converting non-standard text into a standard text while preserving the meaning of the original tweet.