Producing automatic transcriptions of classroom lectures may be important for both e-learning and e-inclusion purposes.
The greatest research challenge is the recognition of spontaneous speech (error rate much higher than for read speech). Even human produced transcriptions would be very difficult to understand because of the absence of punctuation and the presence of disfluencies (filled pauses, repetitions, hesitations, false starts, etc.). Hence, one has to enrich the speech transcription by adding information about sentence boundaries and speech disfluencies.
Sponsored by: FCT (POSC/PLP/58697/2004)
Start: March 2005
Duration: 2 years
Project Leader: Isabel Trancoso
This project is done with the cooperation of IMMI (Intelligent MultiModal Interfaces), led by Prof. Joaquim Jorge.
The goal of this project is the production of multimedia lecture contents for e-learning applications. We shall take as a pilot study a course for which the didactic material (e.g. text book, problems, viewgraphs) is already electronically available and in Portuguese. This is an increasingly more frequent situation, namely in technical courses. Our contribution to these contents will be to add, for each lecture in the course, the recorded video signal and the synchronized lecture transcription. We believe that this synchronized transcription may be specially important for hearing-impaired students.
The LECTRA corpus has been recorded by GAEL (Gabinete de Apoio à Criação de Conteúdos Multimédia e e-Learning, IST).
Two very different courses have been selected for our pilot study: one entitled "Economic Theory I" (ETI) and another one entitled "Production of Multimedia Contents" (PMC). The ETI course (17 classes) and the first 6 classes of the PMC course were recorded with a lapel microphone. The last part of the PMC course (14 classes) was recorded with a head-mounted microphone.
The two recording types presented specific problems. The lapel microphone proved inadequate for this type of recordings given the very high frequency of head turning of the teacher (towards the screen or the white board) that caused very audible intensity fluctuations. The use of the head-mounted microphone clearly improved the audio quality. However, 11% of the recordings were saturated, due to the increase of the recording sound level during the students' questions, in the segments that were recorded right after them.
The classes had variable duration, ranging from 40 to 90 minutes. Both professors were male speakers, with Lisbon accent. Segments from students were not transcribed, as most were not intelligible enough, due to the distance to the microphone.
The manual transcription of this pilot corpus is in progress, using the Transcriber tool. Currently, 5 classes from each course have been transcribed.
Isabel Trancoso, Ricardo Nunes, Luís Neves, C. Viana, H. Moniz, D. Caseiro, A. Isabel Mata, Recognition of Classroom Lectures in European Portuguese, In Proc. INTERSPEECH 2006, Pittsburgh, September 2006
Isabel Trancoso, Ricardo Nunes, Luís Neves, Classroom Lecture Recognition, In Computational Processing of the Portuguese Language: 7th International Workshop, PROPOR 2006, Springer, pages 190 - 199, May 2006
This demo shows the result of the application of our Broadcast News recognizer after adaptation of the acoustic, lexical and language models to the course domain (Production of Multimedia Contents).