Using Background Information to Improve Speech-to-Text Summarization

From HLT@INESC-ID

Ricardo Daniel Ribeiro

Date

  • 15:00, Friday, February 27th, 2009
  • Room 336

Speaker

Abstract

Speech-to-text summarization systems usually take as input the output of an automatic speech recognition (ASR) system that is affected by issues like speech recognition errors, disfluencies, or difficulties in the accurate identification of sentence boundaries. We explore the use of automatically acquired background information to cope with the difficulties of summarizing spoken language.

Two strategies (based on latent semantic analysis) of using this additional information are proposed and evaluated: topic-directed semantic spaces and mixed-source multi-document speech-to-text summarization.