Machine Translation for Microblogs: Difference between revisions

From HLT@INESC-ID

No edit summary
 
(7 intermediate revisions by 2 users not shown)
Line 7: Line 7:
Start: January 2015<br/>
Start: January 2015<br/>
End: December 2015
End: December 2015
http://www.cmuportugal.org/tiercontent.aspx?id=5579


==  INESC-ID Team ==
==  INESC-ID Team ==
Line 21: Line 23:
== Carnegie Mellon University Team ==
== Carnegie Mellon University Team ==


* [[Chris Dyer]]
* [http://www.lti.cs.cmu.edu/people/16595/christopher-dyer Chris Dyer]
* [[Alan Black]]
* [http://www.lti.cs.cmu.edu/people/15686/alan-black Alan Black]


== Summary ==
== Summary ==
Line 32: Line 34:


Wang Ling, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/10156.pdf Paraphrasing 4 Microblog Normalization], In 2013 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, ACL, Seattle, Washington, USA, October 2013
Wang Ling, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/10156.pdf Paraphrasing 4 Microblog Normalization], In 2013 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, ACL, Seattle, Washington, USA, October 2013
Wang Ling, Luís Marujo, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/10552.pdf Crowdsourcing High-Quality Parallel Data Extraction from Twitter], In ACL 2014 Ninth Workshop on Statistical Machine Translation, ACL, Baltimore, USA, June 2014


Wang Ling, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/11343.pdf Two/Too Simple Adaptations of Word2Vec for Syntax Problems], In 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL, Denver, USA, June 2015
Wang Ling, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/11343.pdf Two/Too Simple Adaptations of Word2Vec for Syntax Problems], In 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL, Denver, USA, June 2015


Luís Marujo, Wang Ling, Isabel Trancoso, Chris Dyer, Alan W. Black, Anatole Gershman, David Martins de Matos, João Paulo da Silva Neto, Jaime G. Carbonell, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/10552.pdf Automatic Keyword Extraction on Twitter], In 53rd Annual Meeting of the Association for Computational Linguistics, ACL, Beijing, China, July 2015  
Luís Marujo, Wang Ling, Isabel Trancoso, Chris Dyer, Alan W. Black, Anatole Gershman, David Martins de Matos, João Paulo da Silva Neto, Jaime G. Carbonell, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/11195.pdf Automatic Keyword Extraction on Twitter], In 53rd Annual Meeting of the Association for Computational Linguistics, ACL, Beijing, China, July 2015
 
Wang Ling, Lin Chu-Cheng, Yulia Tsvetkov, Sílvio Moreira, Ramon Fernandez Astudillo, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/11319.pdf Not All Contexts Are Created Equal: Better Word Representations with Variable Attention], In 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), ACL, Lisbon, Portugal, September 2015


Wang Ling, Tiago Luís, Luís Marujo, Ramon Fernandez Astudillo, Sílvio Moreira, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/11318.pdf Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation], In 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), ACL, Lisbon, Portugal, September 2015
Wang Ling, Tiago Luís, Luís Marujo, Ramon Fernandez Astudillo, Sílvio Moreira, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/11318.pdf Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation], In 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), ACL, Lisbon, Portugal, September 2015


Wang Ling, Lin Chu-Cheng, Yulia Tsvetkov, Sílvio Moreira, Ramon Fernandez Astudillo, Chris Dyer, Alan Black, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/11319.pdf Not All Contexts Are Created Equal: Better Word Representations with Variable Attention], In 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), ACL, Lisbon, Portugal, September 2015
Wang Ling, Isabel Trancoso, Chris Dyer, Alan Black, Character-based neural machine translation. arXiv preprint arXiv:1511.04586.
 


Cooperation with DATASTORM Project:
Cooperation with [http://dmir.inesc-id.pt/project/DataStorm DATASTORM] Project:


Sílvio Moreira, Ramon Fernandez Astudillo, Wang Ling, Bruno Martins, Mário J. Silva, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/11168.pdf INESC-ID: A Regression Model for Twitter Sentiment Lexicon Induction], In International Workshop on Semantic Evaluation (SemEval), June 2015
Sílvio Moreira, Ramon Fernandez Astudillo, Wang Ling, Bruno Martins, Mário J. Silva, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/11168.pdf INESC-ID: A Regression Model for Twitter Sentiment Lexicon Induction], In International Workshop on Semantic Evaluation (SemEval), June 2015

Latest revision as of 13:51, 29 May 2020

Sponsored by: FCT (CMUP-EPB/TIC/0026/2013)
Start: January 2015
End: December 2015

http://www.cmuportugal.org/tiercontent.aspx?id=5579

INESC-ID Team

PI: Isabel Trancoso

UNBABEL TEAM

Carnegie Mellon University Team

Summary

The MT4M project develops machine translation systems for content in microblogs, such as Twitter. This domain is characterized by creative use of language, dialectal lexemes, and informal register, which challenge traditional systems. Our earlier work towards this goal explored the fact that parallel data may be found in microblogs, in order to build a normalization model. In our recent work deals with the lexical sparsity that characterizes this domain by proposing character-based word representation models that explore orthographic properties of the language. The advantages of the model go far beyond the machine translation task, generalizing to several other NLP tasks.

Publications

Wang Ling, Guang Xiang, Chris Dyer, Alan Black, Isabel Trancoso, Microblogs as Parallel Corpora, In The 51th Annual Meeting of the Association for Computational Linguistics (ACL), ACL, Sofia, Bulgaria, August 2013

Wang Ling, Chris Dyer, Alan Black, Isabel Trancoso, Paraphrasing 4 Microblog Normalization, In 2013 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, ACL, Seattle, Washington, USA, October 2013

Wang Ling, Luís Marujo, Chris Dyer, Alan Black, Isabel Trancoso, Crowdsourcing High-Quality Parallel Data Extraction from Twitter, In ACL 2014 Ninth Workshop on Statistical Machine Translation, ACL, Baltimore, USA, June 2014

Wang Ling, Chris Dyer, Alan Black, Isabel Trancoso, Two/Too Simple Adaptations of Word2Vec for Syntax Problems, In 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL, Denver, USA, June 2015

Luís Marujo, Wang Ling, Isabel Trancoso, Chris Dyer, Alan W. Black, Anatole Gershman, David Martins de Matos, João Paulo da Silva Neto, Jaime G. Carbonell, Automatic Keyword Extraction on Twitter, In 53rd Annual Meeting of the Association for Computational Linguistics, ACL, Beijing, China, July 2015

Wang Ling, Lin Chu-Cheng, Yulia Tsvetkov, Sílvio Moreira, Ramon Fernandez Astudillo, Chris Dyer, Alan Black, Isabel Trancoso, Not All Contexts Are Created Equal: Better Word Representations with Variable Attention, In 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), ACL, Lisbon, Portugal, September 2015

Wang Ling, Tiago Luís, Luís Marujo, Ramon Fernandez Astudillo, Sílvio Moreira, Chris Dyer, Alan Black, Isabel Trancoso, Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation, In 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), ACL, Lisbon, Portugal, September 2015

Wang Ling, Isabel Trancoso, Chris Dyer, Alan Black, Character-based neural machine translation. arXiv preprint arXiv:1511.04586.


Cooperation with DATASTORM Project:

Sílvio Moreira, Ramon Fernandez Astudillo, Wang Ling, Bruno Martins, Mário J. Silva, Isabel Trancoso, INESC-ID: A Regression Model for Twitter Sentiment Lexicon Induction, In International Workshop on Semantic Evaluation (SemEval), June 2015

Ramon Fernandez Astudillo, Sílvio Moreira, Wang Ling, Bruno Martins, Mário J. Silva, Isabel Trancoso, INESC-ID: Sentiment Analysis without hand-coded Features or Linguistic Resources using Embedding Subspaces, In International Workshop on Semantic Evaluation (SemEval), Code: https://github.com/ramon-astudillo/NLSE, June 2015

Ramon Fernandez Astudillo, Sílvio Moreira, Wang Ling, Mário J. Silva, Isabel Trancoso, Learning Word Representations From Scarce and Noisy Data With Embedding Subspaces, In ACL-IJCNLP, Code: https://github.com/ramon-astudillo/NLSE, July 2015