Speech Recognition: Difference between revisions
From HLT@INESC-ID
No edit summary |
No edit summary |
||
(13 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
The most challenging aspects of speech recognition are the ones related to processing speech in widely different domains, spoken in a variety of dialects, and potentially adverse environments, and dealing with the characteristics of spontaneous speech: no punctuation, disfluencies, emotions, and overlapping turns. In this context, L2F’s activities have been recently concentrated in several research strands: | The most challenging aspects of speech recognition are the ones related to processing speech in widely different domains, spoken in a variety of dialects, and potentially adverse environments, and dealing with the characteristics of spontaneous speech: no punctuation, disfluencies, emotions, and overlapping turns. In this context, L2F’s activities have been recently concentrated in several research strands: | ||
*Broadcast | *Broadcast News (BN) recognition | ||
:Our work in this area started in the scope of the European project ALERT. There are currently two PhD Theses on this topic. | :Our work in this area started in the scope of the European project ALERT. There are currently two PhD Theses on this topic. One covering [[Audio Pre-Processing|audio pre-processing]] and [[BN Speech Recognition]] and another covering BN Language Models. In order to show the developments several prototypes and demos are made. This is the case of a prototype resulting from the ALERT project: [[SSNT - Summarization of Broadcast News Services]]. | ||
* Recognition in adverse environments | * Recognition in adverse environments | ||
:The field of robust speech recognition is relatively new at L2F. We are currently working on speech enhancement techniques using beam forming for a multi-user speaker environment. Our approach has a single array of 64 linearly spaced microphones. | :The field of robust speech recognition is relatively new at L2F. We are currently working on speech enhancement techniques using beam forming for a multi-user speaker environment. Our approach has a single array of 64 linearly spaced microphones. |
Latest revision as of 18:27, 3 July 2006
The most challenging aspects of speech recognition are the ones related to processing speech in widely different domains, spoken in a variety of dialects, and potentially adverse environments, and dealing with the characteristics of spontaneous speech: no punctuation, disfluencies, emotions, and overlapping turns. In this context, L2F’s activities have been recently concentrated in several research strands:
- Broadcast News (BN) recognition
- Our work in this area started in the scope of the European project ALERT. There are currently two PhD Theses on this topic. One covering audio pre-processing and BN Speech Recognition and another covering BN Language Models. In order to show the developments several prototypes and demos are made. This is the case of a prototype resulting from the ALERT project: SSNT - Summarization of Broadcast News Services.
- Recognition in adverse environments
- The field of robust speech recognition is relatively new at L2F. We are currently working on speech enhancement techniques using beam forming for a multi-user speaker environment. Our approach has a single array of 64 linearly spaced microphones.
- Recognition of spontaneous speech
- This line of research is also recent at L2F. It started within the scope of broadcast news recognition, where spontaneous speech segments are characterized by a much higher word error rate, and progressed in two other totally different domains: the meeting domain (public meetings of university councils), and the classroom domain (EEC courses, national project LECTRA). The emphasis so far has been on processing disfluencies [Trancoso 2006].
- Pronunciation modeling
- The problem of pronunciation variation has been dealt with at the automatic alignment level by including alternative pronunciation rules [Trancoso02].