Formalization of English Phrasal Verbs

From HLT@INESC-ID

Peter A. Machonis
Peter A. Machonis
Peter Machonis is Professor of French Linguistics and Honors Fellow at Florida International University. His research interests include French as spoken outside of France, lexicon-grammar, syntax of English idioms, and Natural Language Processing. He has authored two books on the history and evolution of the French language, edited a book on implementing experiential learning in higher education, and has written more than a dozen articles on English idioms, support verbs, and phrasal verbs. This month he has been teaching at the Universidade do Algarve as a third country Erasmus Mundus visiting scholar.
Addresses: www mail

Date

  • 15:00, Friday, May 25th, 2012
  • Room 336

Speaker

  • Peter A. Machonis, Florida International University

Abstract

As Sag et al. (2002: 14) state, multiword expressions "constitute a key problem that must be resolved in order for linguistically precise NLP to succeed." This talk presents the results of using the linguistic development environment, NooJ, together with manually constructed lexicon-grammar tables of transitive and neutral English phrasal verbs, to automatically recognize these structures, with and without insertion, in large corpora.

We tested our grammar on written works, such as 19th century British novels, as well as on an oral corpus consisting of 25 transcribed Larry King Live programs from January 2000 achieving 85% accuracy. In addition to a phrasal verb grammar and a dictionary containing 1,200 English phrasal verbs, we have also added two disambiguation grammars to automatically remove incorrect phrasal verbs from NooJ’s Text Annotation Structures (TAS) and a dictionary to eliminate additional noise originating from compounds and idiomatic expressions. Although our analysis shows how difficult it is to accurately identify English phrasal verbs in large corpora, our study confirms that lexicon-grammar tables and NooJ are indeed powerful linguistic tools that when used together can help solve a key problem in Natural Language Processing.



Note: This seminar will be held in English.