Speech Recognition

From HLT@INESC-ID

Revision as of 17:08, 3 July 2006 by Meinedo (talk | contribs)

The most challenging aspects of speech recognition are the ones related to processing speech in widely different domains, spoken in a variety of dialects, and potentially adverse environments, and dealing with the characteristics of spontaneous speech: no punctuation, disfluencies, emotions, and overlapping turns. In this context, L2F’s activities have been recently concentrated in several research strands:

  • Broadcast News (BN) recognition
Our work in this area started in the scope of the European project ALERT. There are currently two PhD Theses on this topic. One covering [[BN Audio Indexing]Audio indexing] and BN Speech Recognition and the other covering BN Language Models. In order to show the developments several prototypes and demos are made. This is the case of a prototype resulting from the ALERT project: SSNT - Summarization of Broadcast News Services.
  • Recognition in adverse environments
The field of robust speech recognition is relatively new at L2F. We are currently working on speech enhancement techniques using beam forming for a multi-user speaker environment. Our approach has a single array of 64 linearly spaced microphones.
  • Recognition of spontaneous speech
This line of research is also recent at L2F. It started within the scope of broadcast news recognition, where spontaneous speech segments are characterized by a much higher word error rate, and progressed in two other totally different domains: the meeting domain (public meetings of university councils), and the classroom domain (EEC courses, national project LECTRA). The emphasis so far has been on processing disfluencies [Trancoso 2006].
  • Pronunciation modeling
The problem of pronunciation variation has been dealt with at the automatic alignment level by including alternative pronunciation rules [Trancoso02].