|Peter A. Machonis|
|Addresses: www mail|
As Sag et al. (2002: 14) state, multiword expressions "constitute a key problem that must be resolved in order for linguistically precise NLP to succeed." This talk presents the results of using the linguistic development environment, NooJ, together with manually constructed lexicon-grammar tables of transitive and neutral English phrasal verbs, to automatically recognize these structures, with and without insertion, in large corpora.
We tested our grammar on written works, such as 19th century British novels, as well as on an oral corpus consisting of 25 transcribed Larry King Live programs from January 2000 achieving 85% accuracy. In addition to a phrasal verb grammar and a dictionary containing 1,200 English phrasal verbs, we have also added two disambiguation grammars to automatically remove incorrect phrasal verbs from NooJ’s Text Annotation Structures (TAS) and a dictionary to eliminate additional noise originating from compounds and idiomatic expressions. Although our analysis shows how difficult it is to accurately identify English phrasal verbs in large corpora, our study confirms that lexicon-grammar tables and NooJ are indeed powerful linguistic tools that when used together can help solve a key problem in Natural Language Processing.
Note: This seminar will be held in English.