A Lightweight on-the-fly Capitalization System for Automatic Speech Recognition

From HLT@INESC-ID

Fernando Batista

Date

  • 15:00, September 21, 2007
  • 3rd floor meeting room

Speaker

Abstract

This presentation describes a method for capitalizing speech transcriptions. Several resources were used, including a lexicon, newspaper written corpora and speech transcriptions. Different approaches were tested both generative and discriminative: finite state transducers, automatically built from Language Models; and maximum entropy models. Evaluation results are presented both for written newspaper corpora and speech transcriptions of broadcast news corpora.