Parsing Conversational Speech (seminar)


Mari Ostendorf
Mari Ostendorf received the Ph.D. in electrical engineering from Stanford University in 1985. She has worked at BBN Laboratories (1985-1986) and Boston University (1987-1999), and was a visiting researcher at the ATR Interpreting Telecommunications Laboratory in Japan in 1995. In 1999, she joined the University of Washington (UW), where she is currently an Endowed Professor of System Design Methodologies in Electrical Engineering and an Adjunct Professor in Computer Science and Engineering and in Linguistics. She previously served as the EE Associate Chair for Research (2001-2003), and this year she is a Visiting Professor at the University of Karlsruhe. She teaches undergraduate and graduate courses in signal processing and pattern recognition, and most recently is spearheading the development of a course to introduce freshmen to signal processing and information technology. Prof. Ostendorf's research interests are in dynamic and linguistically-motivated statistical models for speech and language processing. Her work has resulted in over 165 publications and 2 paper awards. Prof. Ostendorf has served on numerous technical and advisory committees, as co-Editor of Computer Speech and Language (1998-2003), and now as the Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing. She is a Fellow of IEEE and a member of ISCA, ACL, ASA, SWE and Sigma Xi.



  • Mari Ostendorf (University of Washington)


With recent advances in automatic speech recognition (ASR), there are increasing opportunities for natural language processing of speech, including applications such as speech understanding, summarization and translation. Parsing can play an important role here, but much of current parsing technology has been developed on written text. Spontaneous speech differs substantially from written text, posing challenges for parsing that include the absence of punctuation and the presence of disfluencies and ASR errors. Prosodic cues can help fill in this gap, and there is a long history of linguistic research indicating that prosodic cues in speech can provide disambiguating context beyond that available from punctuation. However, leveraging prosodic cues can be challenging, because of the many roles prosody serves in speech communication. This talk looks at means of leveraging prosody combined with lexical cues and ASR uncertainty models to improve parsing (and recognition) of spontaneous speech. The talk will begin with an overview of studies of prosody and syntax, both perceptual and computational. The focus of the talk will be on our work with a state-of-the-art statistical parser, discussing the issues of sentence segmentation, disfluencies, sub-sentence prosodic constituents, and ASR uncertainty. In addition, we show how these issues impact the use of parsing language models in ASR. We conclude by highlighting challenges in speech processing that impact parsing, including tighter integration of ASR and parsing, as well as portability to new domains.