LECTRA Corpus

From HLT@INESC-ID

Revision as of 07:34, 29 June 2006 by Imt (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

The LECTRA corpus has been recorded by GAEL (Gabinete de Apoio à Criação de Conteúdos Multimédia e e-Learning, IST).

Two very different courses have been selected for our pilot study: one entitled "Economic Theory I" (ETI) and another one entitled "Production of Multimedia Contents" (PMC). The ETI course (17 classes) and the first 6 classes of the PMC course were recorded with a lapel microphone. The last part of the PMC course (14 classes) was recorded with a head-mounted microphone.

The two recording types presented specific problems. The lapel microphone proved inadequate for this type of recordings given the very high frequency of head turning of the teacher (towards the screen or the white board) that caused very audible intensity fluctuations. The use of the head-mounted microphone clearly improved the audio quality. However, 11% of the recordings were saturated, due to the increase of the recording sound level during the students' questions, in the segments that were recorded right after them.

The classes had variable duration, ranging from 40 to 90 minutes. Both professors were male speakers, with Lisbon accent. Segments from students were not transcribed, as most were not intelligible enough, due to the distance to the microphone.

The manual transcription of this pilot corpus is in progress, using the Transcriber tool. Currently, 5 classes from each course have been transcribed.