Using unsupervised word sense disambiguation to guess verb subjects on untagged corpora

From HLT@INESC-ID

Paula Cristina Vaz

Date

  • 15:00, February 01, 2008
  • Room 336

Speaker

Abstract

This work explores the use of subject lists extracted from an annotated corpus to find subject-verb pairs in untagged corpora. Our goal is to identify verb syntactic functions (subjects and direct objects) to characterize verb arguments. Identifying syntactic functions on corpora using parsers is time-consuming. It is desirable to automate the annotation process of the syntactic functions without parsing the corpus. We present a method that uses an annotated corpus, and SenseClusters, an unsupervised clustering tool for word sense disambiguation. Sentences with synonymous verbs were clustered. We observe that verbs in the same cluster have the same list of nouns as subject in the test corpus, even though the specific pair subject/verb does not appear in the annotated corpus. The result shows that annotating the subject/verb pair using the subject lists extracted from the clusters is quicker than syntactically parsing the corpus.