Parsing Conversational Speech (seminar)

From HLT@INESC-ID

Revision as of 15:24, 19 June 2006 by David (talk | contribs)

Date

  • 11:00 - June 21, 2006
  • IST, Torre Norte, Anfiteatro Ea3

Speaker

  • Mari Ostendorf (University of Washington)

Abstract

With recent advances in automatic speech recognition (ASR), there are increasing opportunities for natural language processing of speech, including applications such as speech understanding, summarization and translation. Parsing can play an important role here, but much of current parsing technology has been developed on written text. Spontaneous speech differs substantially from written text, posing challenges for parsing that include the absence of punctuation and the presence of disfluencies and ASR errors. Prosodic cues can help fill in this gap, and there is a long history of linguistic research indicating that prosodic cues in speech can provide disambiguating context beyond that available from punctuation. However, leveraging prosodic cues can be challenging, because of the many roles prosody serves in speech communication. This talk looks at means of leveraging prosody combined with lexical cues and ASR uncertainty models to improve parsing (and recognition) of spontaneous speech. The talk will begin with an overview of studies of prosody and syntax, both perceptual and computational. The focus of the talk will be on our work with a state-of-the-art statistical parser, discussing the issues of sentence segmentation, disfluencies, sub-sentence prosodic constituents, and ASR uncertainty. In addition, we show how these issues impact the use of parsing language models in ASR. We conclude by highlighting challenges in speech processing that impact parsing, including tighter integration of ASR and parsing, as well as portability to new domains.