LECTRA (Rich Transcription of Lectures for E-Learning Applications): Difference between revisions

From HLT@INESC-ID

No edit summary
 
(19 intermediate revisions by the same user not shown)
Line 5: Line 5:
|}
|}


[http://www.l2f.inesc-id.pt/~imt/Bolsas_POSC_PLP_58697_2007.pdf OPEN RESEARCH POSITION - DEADLINE JULY 30 2007]
[http://www.l2f.inesc-id.pt/~imt/LECTRA_monofolha2.pdf OPEN RESEARCH POSITION - DEADLINE JULY 30 2007]
 
Producing automatic transcriptions of classroom lectures may be important for both e-learning and e-inclusion purposes.
The greatest research challenge is the recognition of spontaneous speech (error rate much higher than for read speech). Even human produced transcriptions would be very difficult to understand because of the absence of punctuation and the presence of disfluencies (filled pauses, repetitions, hesitations, false starts, etc.). Hence, one has to enrich the speech transcription by adding information about sentence boundaries and speech disfluencies.


Sponsored by: FCT (POSC/PLP/58697/2004)<br/>  
Sponsored by: FCT (POSC/PLP/58697/2004)<br/>  
Start: March 2005<br/>
Start: March 2005<br/>
End: December 2007
End: December 2007
== Goals ==
The goal of the LECTRA project was the production of multimedia lecture contents for e-learning applications. This implies taking the recorded audio-visual signal and adding the automatically produced speech transcription as caption.  The greatest research challenges are the adaptation of the recognition models  to the very difficult domain of University lectures, the recognition of spontaneous speech, namely in what concerns disfluencies (filled pauses, repetitions, hesitations, false starts, etc.), and the enrichment of the automatic speech transcription with punctuation and capitalization.
Producing automatic transcriptions of classroom lectures may be important for both e-learning and e-inclusion purposes.
Producing automatic transcriptions of classroom lectures may be important for both e-learning and e-inclusion purposes.
== Team ==
== Team ==


Line 25: Line 29:
* [[Luís Caldas de Oliveira|Luís Oliveira]]  
* [[Luís Caldas de Oliveira|Luís Oliveira]]  
* [[Fernando Batista]]  
* [[Fernando Batista]]  
 
* [[Helena Moniz]]
Undergraduate Students:
* [[Ricardo Nunes]]
* [[Luís Neves]]  
* [[Luís Neves]]  
* [[Rui Martins]]
* [[Vera Cabarrão]]
* [[Fernando Costa]]


This project is done with the cooperation of [http://immi.inesc-id.pt/ IMMI] (Intelligent MultiModal Interfaces), led by Prof. Joaquim Jorge.
This project is done with the cooperation of [http://immi.inesc-id.pt/ IMMI] (Intelligent MultiModal Interfaces), led by Prof. Joaquim Jorge.
Line 33: Line 40:
== Summary ==
== Summary ==


The goal of this project is the production of multimedia lecture contents for e-learning applications. We shall take as a pilot study a course for which the didactic material (e.g. text book, problems, viewgraphs) is already electronically available and in Portuguese. This is an increasingly more frequent situation, namely in technical courses. Our contribution to these contents will be to add, for each lecture in the course, the recorded video signal and the synchronized lecture transcription. We believe that this synchronized transcription may be specially important for hearing-impaired students.  
The goal of the LECTRA project was the production of multimedia lecture contents for e-learning applications.  
 
The project encompassed 5 main tasks. The first one deals with the collection of the training and test material of the set of 5 selected courses. This involved not only the recordings of the audio-video signals, but also the collection of support text material for these courses (e.g. text book, problems, viewgraphs), and the manual annotation of a subset of the recorded data.
 
In the second task we used this training data to adapt the acoustic, lexical and language models of our large vocabulary continuous speech recognizer to the course domain, thus yielding a first transcription of the lecture contents. This involved namely building interpolated language models for the 5 courses,  
and exploring unsupervised learning approaches for acoustic model adaptation.
The latter implied the implementation of confidence measures in our general purpose recognition engine.
 
The third task had as a goal to "enrich" this first transcription with metadata that would render it more intelligible. Given the state of the art in terms of metadata extraction and the comparatively low recognition rate for spontaneous speech relative to read speech, this task was the most challenging one. We proceeded in two different directions: the study of disfluencies in European Portuguese and the enrichment of the automatically produced transcription with punctuation and capitalization. In what concerns disfluencies, particular attention was devoted to the analysis and modeling of filled pauses, recently complemented with the study of prolongations. The work on punctuation and capitalization started with a different type of corpus (broadcast news), mostly because of the much larger size of this corpus at the start of this project, but also because it is useful to do the first experiments with read speech before proceeding to spontaneous speech and this corpus has large quantities of both.
We believe that producing a surface rich transcription is essential to make the recognition output intelligible for hearing impaired students.


The project will encompass 5 main tasks. In the first one we shall collect the training and test material (both in terms of recorded audio-video signals and textual data) related to this course. In the second task we shall use this training data to adapt the acoustic, lexical and language models of our large vocabulary continuous speech recognizer to the course domain, thus yielding a first transcription of the lecture contents. The third task has as a goal to "enrich" this transcription with metadata that would render it more intelligible. Given the state of the art in terms of metadata extraction and the comparatively low recognition rate for spontaneous speech relative to read speech, this task is the one where the main research challenge resides. The fourth task deals with integrating the recorded audio-video and corresponding transcription with the other multimedia contents and synchronize them according to topic, so that a student may browse through the contents, seeing a viewgraph, the corresponding part in the text book, and the audio-video with the corresponding lecture transcription as caption. The final task is user evaluation for which we intend to use a panel of both normal hearing and hearing impaired students. For the latter, we shall evaluate two types of lecture transcription: with and without manual correction. This later evaluation will give us an indication of how close we are in terms of automatic lecture transcription to be able to use such tools in real-time in a classroom.
The fourth task dealt with integrating the recorded audio-video and corresponding transcription with the other multimedia contents, so that a student may browse through the contents, seeing a viewgraph, the corresponding part in the text book, and the audio-video with the corresponding lecture transcription as caption. This work was greatly facilitated by the cooperation with the IMMI (Intelligent Multimodal Interfaces) group of INESC-ID, led by Prof. Joaquim Jorge, who was also one of the voluntary teachers of our recordings. The web browsing interface built in their Virtual Curricula project is very well suited to the needs of the LECTRA project.
 
The final task is user evaluation. Due to the difficulties of arranging a panel of both normal hearing and hearing impaired students in off-line experiments, we decided to conduct an on-line recognition experiment. For this purpose, a course on Object Oriented Programming was recorded during the last semester. Besides
recording the video course, the audio was also in parallel fed into our
recognizer. This experiment allowed us to identify the main problems that still affect the recognition of spontaneous speech, and in particular of classroom lectures.


== Workplan ==
== Workplan ==
Line 41: Line 61:
* T1 - Data collection
* T1 - Data collection
* T2 - Model adaptation
* T2 - Model adaptation
* T3 - Spontaneous speech recognition
* T3 - Metadata extraction
* T4 - Integration of lecture transcription with other multimedia conetnts
* T4 - Integration of lecture transcription with other multimedia conetnts
* T5 - User evaluation
* T5 - User evaluation
Line 47: Line 67:
== Corpus ==
== Corpus ==


The LECTRA corpus has been recorded by GAEL (Gabinete de Apoio à Criação de Conteúdos Multimédia e e-Learning, IST).
The LECTRA corpus includes audio, text and support materials for 5 courses: Production of Multimedia Contents, Economic Theory I, Linear Algebra, Introduction to Information and Communication Technologies, and  Object Oriented Programming. On purpose, we selected very different and challenging courses, in order to analyze the influence of several factors.
 
This corpus is extremely important for studying spontaneous speech phenomena in European Portuguese. The study of filled pauses and prolongations that was done in the scope of this project is just one of the first steps.


Two very different courses have been selected for our pilot study: one entitled "Economic Theory I" (ETI) and another one entitled "Production of Multimedia Contents" (PMC). The ETI course (17 classes) and the first 6 classes of the PMC course were recorded with a lapel microphone. The last part of the PMC course (14 classes) was recorded with a head-mounted microphone.
== Publications ==


The two recording types presented specific problems. The lapel microphone proved inadequate for this type of recordings given the very high frequency of head turning of the teacher (towards the screen or the white board) that caused very audible intensity fluctuations. The use of the head-mounted microphone clearly improved the audio quality. However, 11% of the recordings were saturated, due to the increase of the recording sound level during the students' questions, in the segments that were recorded right after them.
Isabel Trancoso, Ricardo Nunes, Luís Neves, C. Viana, H. Moniz, D. Caseiro, A. Isabel Mata, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/3410.pdf Recognition of Classroom Lectures in European Portuguese], In Interspeech 2006, September 2006


The classes had variable duration, ranging from 40 to 90 minutes. Both professors were male speakers, with Lisbon accent. Segments from students were not transcribed, as most were not intelligible enough, due to the distance to the microphone.
Isabel Trancoso, Ricardo Nunes, Luís Neves, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/3169.pdf Classroom Lecture Recognition], In Computational Processing of the Portuguese Language: 7th International Workshop, PROPOR 2006, Springer, pages 190 - 199, May 2006


The manual transcription of this pilot corpus is in progress, using the Transcriber tool. Currently, 5 classes from each course have been transcribed.
Rui Pedro Batoreo Amaral, Hugo Meinedo, Diamantino António Caseiro, Isabel Trancoso, João Paulo da Silva Neto, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/4299.pdf A Prototype System for Selective Dissemination of Broadcast News in European Portuguese], EURASIP Journal on Advances in Signal Processing, Hindawi Publishing Corporation, vol. 2007, n. 37507, May 2007


== Publications ==
Ciro Alexandre Domingues Martins, António Teixeira, João Paulo da Silva Neto, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/4684.pdf Vocabulary Selection for a Broadcast News Transcription System using a Morpho-syntatic Approach], In Interspeech 2007, August 2007
 
Fernando Batista, Diamantino António Caseiro, Nuno J. Mamede, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/4144.pdf Recovering Punctuation Marks for Automatic Speech Recognition], In Interspeech 2007, August 2007
 
Helena Gorete Silva Moniz, Ana Isabel Mata da Silva, Maria do Céu Guerreiro Viana Ribeiro, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/4060.pdf On Filled Pauses and Prolongations in European Portuguese], In Interspeech 2007, August 2007
 
Fernando Batista, Nuno J. Mamede, Diamantino António Caseiro, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/4145.pdf A Lightweight on-the-fly Capitalization System for Automatic Speech Recognition], In Recent Advances in Natural Language Processing, vol. 1, September 2007
 
Ciro Alexandre Domingues Martins, António Teixeira, João Paulo da Silva Neto, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/4564.pdf Dynamic Language Modeling for a Daily Broadcast News Transcription System], In ASRU 2007, December 2007


Isabel Trancoso, Ricardo Nunes, Luís Neves, C. Viana, H. Moniz, D. Caseiro, A. Isabel Mata, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/3410.pdf Recognition of Classroom Lectures in European Portuguese], In Proc. INTERSPEECH 2006, Pittsburgh, September 2006
Isabel Trancoso, Rui Martins, Helena Moniz, Ana Isabel Mata, Céu Viana, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/4809.pdf The LECTRA Corpus - Classroom Lecture Transcriptions in European Portuguese], In LREC 2008, June 2008


Isabel Trancoso, Ricardo Nunes, Luís Neves, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/3169.pdf Classroom Lecture Recognition], In Computational Processing of the Portuguese Language: 7th International Workshop, PROPOR 2006, Springer, pages 190 - 199, May 2006
Fernando Batista, Diamantino António Caseiro, Nuno J. Mamede, Isabel Trancoso, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/4830.pdf Recovering Capitalization and Punctuation Marks for Automatic Speech Recognition: Case Study for the Portuguese Broadcast News], Speech Communication, vol. 50, n. 10, pages 847-862, October 2008


== Demos ==
== Demos ==

Latest revision as of 19:13, 3 January 2009

OPEN RESEARCH POSITION - DEADLINE JULY 30 2007

Sponsored by: FCT (POSC/PLP/58697/2004)
Start: March 2005
End: December 2007

Goals

The goal of the LECTRA project was the production of multimedia lecture contents for e-learning applications. This implies taking the recorded audio-visual signal and adding the automatically produced speech transcription as caption. The greatest research challenges are the adaptation of the recognition models to the very difficult domain of University lectures, the recognition of spontaneous speech, namely in what concerns disfluencies (filled pauses, repetitions, hesitations, false starts, etc.), and the enrichment of the automatic speech transcription with punctuation and capitalization. Producing automatic transcriptions of classroom lectures may be important for both e-learning and e-inclusion purposes.

Producing automatic transcriptions of classroom lectures may be important for both e-learning and e-inclusion purposes.

Team

Project Leader: Isabel Trancoso

This project is done with the cooperation of IMMI (Intelligent MultiModal Interfaces), led by Prof. Joaquim Jorge.

Summary

The goal of the LECTRA project was the production of multimedia lecture contents for e-learning applications.

The project encompassed 5 main tasks. The first one deals with the collection of the training and test material of the set of 5 selected courses. This involved not only the recordings of the audio-video signals, but also the collection of support text material for these courses (e.g. text book, problems, viewgraphs), and the manual annotation of a subset of the recorded data.

In the second task we used this training data to adapt the acoustic, lexical and language models of our large vocabulary continuous speech recognizer to the course domain, thus yielding a first transcription of the lecture contents. This involved namely building interpolated language models for the 5 courses, and exploring unsupervised learning approaches for acoustic model adaptation. The latter implied the implementation of confidence measures in our general purpose recognition engine.

The third task had as a goal to "enrich" this first transcription with metadata that would render it more intelligible. Given the state of the art in terms of metadata extraction and the comparatively low recognition rate for spontaneous speech relative to read speech, this task was the most challenging one. We proceeded in two different directions: the study of disfluencies in European Portuguese and the enrichment of the automatically produced transcription with punctuation and capitalization. In what concerns disfluencies, particular attention was devoted to the analysis and modeling of filled pauses, recently complemented with the study of prolongations. The work on punctuation and capitalization started with a different type of corpus (broadcast news), mostly because of the much larger size of this corpus at the start of this project, but also because it is useful to do the first experiments with read speech before proceeding to spontaneous speech and this corpus has large quantities of both. We believe that producing a surface rich transcription is essential to make the recognition output intelligible for hearing impaired students.

The fourth task dealt with integrating the recorded audio-video and corresponding transcription with the other multimedia contents, so that a student may browse through the contents, seeing a viewgraph, the corresponding part in the text book, and the audio-video with the corresponding lecture transcription as caption. This work was greatly facilitated by the cooperation with the IMMI (Intelligent Multimodal Interfaces) group of INESC-ID, led by Prof. Joaquim Jorge, who was also one of the voluntary teachers of our recordings. The web browsing interface built in their Virtual Curricula project is very well suited to the needs of the LECTRA project.

The final task is user evaluation. Due to the difficulties of arranging a panel of both normal hearing and hearing impaired students in off-line experiments, we decided to conduct an on-line recognition experiment. For this purpose, a course on Object Oriented Programming was recorded during the last semester. Besides recording the video course, the audio was also in parallel fed into our recognizer. This experiment allowed us to identify the main problems that still affect the recognition of spontaneous speech, and in particular of classroom lectures.

Workplan

  • T1 - Data collection
  • T2 - Model adaptation
  • T3 - Metadata extraction
  • T4 - Integration of lecture transcription with other multimedia conetnts
  • T5 - User evaluation

Corpus

The LECTRA corpus includes audio, text and support materials for 5 courses: Production of Multimedia Contents, Economic Theory I, Linear Algebra, Introduction to Information and Communication Technologies, and Object Oriented Programming. On purpose, we selected very different and challenging courses, in order to analyze the influence of several factors.

This corpus is extremely important for studying spontaneous speech phenomena in European Portuguese. The study of filled pauses and prolongations that was done in the scope of this project is just one of the first steps.

Publications

Isabel Trancoso, Ricardo Nunes, Luís Neves, C. Viana, H. Moniz, D. Caseiro, A. Isabel Mata, Recognition of Classroom Lectures in European Portuguese, In Interspeech 2006, September 2006

Isabel Trancoso, Ricardo Nunes, Luís Neves, Classroom Lecture Recognition, In Computational Processing of the Portuguese Language: 7th International Workshop, PROPOR 2006, Springer, pages 190 - 199, May 2006

Rui Pedro Batoreo Amaral, Hugo Meinedo, Diamantino António Caseiro, Isabel Trancoso, João Paulo da Silva Neto, A Prototype System for Selective Dissemination of Broadcast News in European Portuguese, EURASIP Journal on Advances in Signal Processing, Hindawi Publishing Corporation, vol. 2007, n. 37507, May 2007

Ciro Alexandre Domingues Martins, António Teixeira, João Paulo da Silva Neto, Vocabulary Selection for a Broadcast News Transcription System using a Morpho-syntatic Approach, In Interspeech 2007, August 2007

Fernando Batista, Diamantino António Caseiro, Nuno J. Mamede, Isabel Trancoso, Recovering Punctuation Marks for Automatic Speech Recognition, In Interspeech 2007, August 2007

Helena Gorete Silva Moniz, Ana Isabel Mata da Silva, Maria do Céu Guerreiro Viana Ribeiro, On Filled Pauses and Prolongations in European Portuguese, In Interspeech 2007, August 2007

Fernando Batista, Nuno J. Mamede, Diamantino António Caseiro, Isabel Trancoso, A Lightweight on-the-fly Capitalization System for Automatic Speech Recognition, In Recent Advances in Natural Language Processing, vol. 1, September 2007

Ciro Alexandre Domingues Martins, António Teixeira, João Paulo da Silva Neto, Dynamic Language Modeling for a Daily Broadcast News Transcription System, In ASRU 2007, December 2007

Isabel Trancoso, Rui Martins, Helena Moniz, Ana Isabel Mata, Céu Viana, The LECTRA Corpus - Classroom Lecture Transcriptions in European Portuguese, In LREC 2008, June 2008

Fernando Batista, Diamantino António Caseiro, Nuno J. Mamede, Isabel Trancoso, Recovering Capitalization and Punctuation Marks for Automatic Speech Recognition: Case Study for the Portuguese Broadcast News, Speech Communication, vol. 50, n. 10, pages 847-862, October 2008

Demos

This demo shows the result of the application of our Broadcast News recognizer after adaptation of the acoustic, lexical and language models to the course domain (Production of Multimedia Contents).