Speech Recognition: Difference between revisions

Revision as of 18:16, 3 July 2006

The most challenging aspects of speech recognition are the ones related to processing speech in widely different domains, spoken in a variety of dialects, and potentially adverse environments, and dealing with the characteristics of spontaneous speech: no punctuation, disfluencies, emotions, and overlapping turns. In this context, L2F’s activities have been recently concentrated in several research strands:

Broadcast News (BN) recognition

Our work in this area started in the scope of the European project ALERT. There are currently two PhD Theses on this topic. One covering BN Audio Indexing and BN Speech Recognition and another covering BN Language Models. In order to show the developments several prototypes and demos are made. This is the case of a prototype resulting from the ALERT project: SSNT - Summarization of Broadcast News Services.

Recognition in adverse environments

The field of robust speech recognition is relatively new at L2F. We are currently working on speech enhancement techniques using beam forming for a multi-user speaker environment. Our approach has a single array of 64 linearly spaced microphones.

Recognition of spontaneous speech

This line of research is also recent at L2F. It started within the scope of broadcast news recognition, where spontaneous speech segments are characterized by a much higher word error rate, and progressed in two other totally different domains: the meeting domain (public meetings of university councils), and the classroom domain (EEC courses, national project LECTRA). The emphasis so far has been on processing disfluencies [Trancoso 2006].

Pronunciation modeling

The problem of pronunciation variation has been dealt with at the automatic alignment level by including alternative pronunciation rules [Trancoso02].

@@ Line 1: / Line 1: @@
 The most challenging aspects of speech recognition are the ones related to processing speech in widely different domains, spoken in a variety of dialects, and potentially adverse environments, and dealing with the characteristics of spontaneous speech: no punctuation, disfluencies, emotions, and overlapping turns. In this context, L2F’s activities have been recently concentrated in several research strands:
 *Broadcast News (BN) recognition
-:Our work in this area started in the scope of the European project ALERT. There are currently two PhD Theses on this topic. One covering [[BN Audio Pre-processing|BN Audio Indexing]] and [[BN Speech Recognition]] and the other covering [[BN Language Models]]. In order to show the developments several prototypes and demos are made. This is the case of a prototype resulting from the ALERT project: [[SSNT - Summarization of Broadcast News Services]].
+:Our work in this area started in the scope of the European project ALERT. There are currently two PhD Theses on this topic. One covering [[Audio  indexation|BN Audio Indexing]] and [[BN Speech Recognition]] and another covering [[BN Language Models]]. In order to show the developments several prototypes and demos are made. This is the case of a prototype resulting from the ALERT project: [[SSNT - Summarization of Broadcast News Services]].
 * Recognition in adverse environments
 :The field of robust speech recognition is relatively new at L2F. We are currently working on speech enhancement techniques using beam forming for a multi-user speaker environment. Our approach has a single array of 64 linearly spaced microphones.

Speech Recognition: Difference between revisions

From HLT@INESC-ID

Revision as of 18:16, 3 July 2006