Using unsupervised word sense disambiguation to guess verb subjects on untagged corpora

Paula Cristina Vaz

Date

15:00, February 01, 2008
Room 336

Speaker

Paula Cristina Vaz

Abstract

This work explores the use of subject lists extracted from an annotated corpus to ﬁnd subject-verb pairs in untagged corpora. Our goal is to identify verb syntactic functions (subjects and direct objects) to characterize verb arguments. Identifying syntactic functions on corpora using parsers is time-consuming. It is desirable to automate the annotation process of the syntactic functions without parsing the corpus. We present a method that uses an annotated corpus, and SenseClusters, an unsupervised clustering tool for word sense disambiguation. Sentences with synonymous verbs were clustered. We observe that verbs in the same cluster have the same list of nouns as subject in the test corpus, even though the speciﬁc pair subject/verb does not appear in the annotated corpus. The result shows that annotating the subject/verb pair using the subject lists extracted from the clusters is quicker than syntactically parsing the corpus.

Using unsupervised word sense disambiguation to guess verb subjects on untagged corpora

From HLT@INESC-ID

Date

Speaker

Abstract