ALERT
From HLT@INESC-ID
Alert System for Selective Dissemination of Multimedia Information (2000-2002)
Summary: The ALERT project aims to associate state-of-the-art speech recognition with audio and video segmentation and automatic topic indexing to develop an automatic media-monitoring demonstrator and evaluate it in the context of real world applications. The targeted languages are Portuguese, French and German. The consortium consists of academic partners well known for their work in speech processing, speech recognition, topic detection and video processing as well as of leading European media monitoring companies that offer Selective Dissemination of Information (SDI) services for clients and of a major European TV station that runs its own SDI department. Furthermore, two of the partners are software-developing companies that will serve as system integrators. They have a remarkable experience in developing software products in the field of speech and video processing.
Objectives: Keeping aware of information is of strategic importance for many businesses and government agencies as well as for every citizen. With the rapid expansion of different media sources (newspapers, newswire, radio, television, internet) for information dissemination, there is a large market for monitoring these sources and an increasing need for automatic processing of the data. Therefore media monitoring is a crucial activity. For the most part today's methods are manual, with humans reading, listening and watching, annotating topics and selecting items of interest for the user. The ALERT project aims to demonstrate that by associating state-of-the-art speech recognition with audio and video segmentation and automatic topic detection, an automatic media monitoring demonstration system can be developed, which detects topics in large amounts of multimedia data and alerts appropriate users accordingly.
Work description: The demonstration system to be set up within this project will store the users' special interests in lists of topics and whenever detecting one of these topics in a large multimedia database, it will alert the user. The data processed can be video, audio or written text in one of the targeted languages French, German and Portuguese. In order to have a topic detection module of high precision and accuracy, research has to be carried out in terms of robust speech recognition, topic detection on the erroneous speech recogniser output and combined video and audio based segmentation as well as on a unified representation of the different data types in the diverse languages. Video and audio-based segmentation will be used in order to obtain a coarse segmentation of TV data into shots, which carry specific information. The speech recognition module will deliver a transcription of the audio track that is associated with these data segments. Although advanced broadcast speech recognition techniques will be used in this module, incorporating the use of confidence measures and adaptation to varying speech quality, the transcription will still contain a considerable number of errors. By using sophisticated statistical methods for topic detection on this transcribed data, it is, however, possible to achieve a high topic detection rate and thus to build a high performance topic detection system, which will work for three major European languages. A demonstrator will be built that is capable of automatically detecting topics in large multimedia databases and thus is able to reduce the amount of time for media monitoring to a fraction of what is required for manual processing of the multimedia input data. In this way, the industrial partners will be enabled to process the most recent news data on the spot and alert the user instantly
Milestones: - high performance speech recognisers for all the three targeted languages (French, Portuguese and German) - high performance topic detection on the erroneous output of a speech recogniser (using the best hypothesis, confidence measures and word graphs) - unified representation of different document types in multiple languages - combined video and audio based segmentation
Achievements:The ALERT project demonstrated that by associating state-of-the-art speech recognition with audio and video segmentation and automatic topic detection, an automatic media monitoring demonstration system can be developed that detects topics in large amounts of multimedia data and alerts those users about the detection of this information that it is relevant for. The ALERT demonstrator has been developed for the three languages French, German and Portugues and is capable of processing and indexing multimedia content (radio or TV broadcast or internet audio/video) with a strong focus on news and information programs.
Start Date: 2000-01-01 End Date: 2002-06-30 Duration: 30 months