A Lightweight on-the-fly Capitalization System for Automatic Speech Recognition

From HLT@INESC-ID

Revision as of 09:31, 21 September 2007 by David (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Fernando Batista
Fernando Batista

Date

  • 15:00, September 21, 2007
  • 3rd floor meeting room

Speaker

Abstract

This presentation describes a method for capitalizing speech transcriptions. Several resources were used, including a lexicon, newspaper written corpora and speech transcriptions. Different approaches were tested both generative and discriminative: finite state transducers, automatically built from Language Models; and maximum entropy models. Evaluation results are presented both for written newspaper corpora and speech transcriptions of broadcast news corpora.