SSNT - Summarization of Broadcast News Services: Difference between revisions

Latest revision as of 13:48, 23 October 2020

This is the presentation of the SSNT service through a detailed description of its features. During this description you can discover several relevant aspects for a complete perception of this new service that we are offering. If you are not interested in such a detailed description and want just a quick view of the service we propose the first news of the last processed "Telejornal" news show: If you are interested in a detailed description of the service we propose the following set of points:

Goals
Support
Functional diagram
System description
User interface
Present limitations
Access to SSNT

Goals

Nowadays there is a significant need to deal with large amounts of multimedia information. With this service we want to develop a selective dissemination of multimedia contents, mainly of TV broadcast news. The use of advanced techniques for the processing of BN programs, through a segmentation and categorization process, made possible the access to the contents of the programs based on an individual definition of the user profiles.

Through this new service we made available the 8 o'clock news program of the main channel of RTP (Telejornal). The users are able to define which thematic areas they are interested and after the automatic processing of the program they receive an email with the news that fit to the requested domains. Be one of them and start using now this new service.

Support

This system was initially developed in consortium in the scope of the European project ALERT between INESC ID Lisboa, 4VDO and RTP. The developments of the large vocabulary continuous speech recognition system have been supported by the project POSI/33846/PLP/2000 financed by FCT.

Functional diagram

In the next figure a functional diagram of the service is presented.

As we can observe from the functional diagram, the system analyses a generic multimedia document and based on the contents segment it in coherent blocks, through a video and/or audio segmentation.

When the document contains audio an automatic transcription is performed through a large vocabulary continuous speech recognition system. Based on the block segmentation and on the text inside each block, resulting from the transcription or because is only a text document, an automatic detection of topics is performed in each block, with the possibility of clustering together several blocks in homogeneous segments according to the topics contents.

With the multimedia document divided into segments, and a set of topics assigned to each segment, a search is performed on the user profiles requiring the topics from that segments and an alert message is generated for that users.

At the end of the process the multimedia document is loaded into a database where we keep the document segmentation and the appropriate categorization in topics.

System description

The development of the system was based on a three main blocks structure: the CAPTURE block, responsible for the capture of the monitoring defined programs, the PROCESSING block, responsible to generate the relevant markup information associated to each program, and the SERVICE block, responsible for the user interface and database management. The control of the overall process is based on a simple semaphore scheme.

CAPTURE block

From a list of news shows that we intent to monitor a web script schedules the recordings by downloading from the TV station web site the daily time schedule (expected starting and ending time). It is frequent that the actual news show duration is larger than what had been advertised in the time schedule by the TV station. To ensure we record the complete show our script programs the recording to start a little earlier (1 minute) than announced and records much more time after the advertised ending time (20 minutes).

The capture script records the specified news show at the defined time using a TV capture board (Pinnacle PCTV Pro) that has direct access to a TV cable network. The recording produces two independent streams: a MPEG-2 video stream and a uncompressed, 44.1 kHz, mono, 16 bit audio stream. After the recording finishes the audio stream generated is downsampled to 16 kHz. Finally a flag file (.ready) is also generated. This signal triggers the PROCESSING block.

After the PROCESSING block sends back jingle detection information changing the signal flag from (.ready) to (.proc) the CAPTURE block starts multiplexing the recorded video and streams together and using the jingle detection information to cut out unwanted portions effectively producing an AVI file with only the news show. This multiplexed AVI file has MPEG-4 video (DivX 5.2) and MP3 audio.

After the PROCESSING block finishes processing and sends back the XML file signalling the CAPTURE block with a flag (.proc.2) the CAPTURE block starts generating individual AVI video files for each news report from the AVI containing the full news show. These individual AVI files have less video quality which is suitable for streaming to portable devices.

All the AVI video files generated are sent to the SERVICE block for conversion to Real Media format, the format we use for video streaming over the web.

PROCESSING block

The audio stream generated is processed through several stages that successively segment, transcribe and index it, compiling the resulting transcription and metadata information into a XML file. The stages are:

Jingle detection
Audio pre-processor
Automatic Speech recognition (AUDIMUS)
Topic segmentation and indexing
Title and summary
Generate transcription XML file with all metadata information

Jingle Detection

First the recorded audio file is processed by the "Jingle Detection" module which identifies the precise news show start and end times identifies certain portions of audio that are not relevant to the story (news fillers) and detects commercial breaks inside the news show. Based on these time instants a new audio file is generated with only the relevant contents of the news show.

Audio pre-processor

The new audio file is then fed through an Audio Pre-processor module. This module outputs a set of homogeneous acoustic segments discriminating between speech and non-speech. The speech segments contain audio from only one speaker and have markups concerning the background conditions, the speaker gender, and speaker clustering.

Speech recognition

Each audio segment that was marked by the Audio Pre-processor as containing speech (transcribable segment) is then processed by the Speech Recognition module.

Topic segmentation and indexing

The Topic segmentation module groups segments to define a complete and homogeneous story. For each story the Topic Indexing module generates a classification, according to the hierarchically organized thematic thesaurus, about the contents of the story.

Title and summary

Finally a module for generating a Title and Summary is applied to each story. Since we will deliver to the user a set of stories, we would like to have a mechanism to give a close idea about the contents of the story besides the topic indexing.

Generate transcription XML

After all these processing stages an XML file containing all the relevant information that we were able to extract is generated according to a DTD specification.

SERVICE block

The SERVICE block is responsible for loading the XML file generated in the PROCESSING block into a database, converting the AVI video files into Real Media format, running the web video streaming server, running the web pages server for the user interface, managing the user profiles in a database and sending alert messages to the users resulting from the match between the news show information and the users profiles.

User interface

The first page supplly a short description of the service with three different access points. It is possible to register as a new user (this new user could apply for all the features of the service), allows to access to the system to reconfigure a pre registered user (through the username and password authentication), or a direct search over the database of segmented and indexed multimedia documents. Following we will give details over these different accessing points.

Registering as a new user

When choosing in the first page the SSNT registo the user enter in a new page where he must fill some personal data. The user becomes knew to the system through his username and email address.

After fill the personal information the user must press the button Actualizar Alterações, in the right below corner of the page. At that point the user should leave the system for receiving an email from the system with the final confirmation. In that email the user may find a link to confirm the registering as a new user and to access to the user profile definition.

The user profile definition is based on the choice of thematic domains, onomastic index, geographic index or free text. The profile definition results from an AND operation in each of these fields. To make effective the profile definition the user must click over the Adicionar button. At that time appears in the window of profile definition, in the below part of the page, a new thematic domain restricted by the possible indication of onomastic and geographic index or free text. The user may define different domains clustered together through an OR operation and that will appear in different lines in the window of profile definition. The user may also select different thematic domains simultaneously through the selective choice with the mouse, and for a thematic domain define in a more specific way the themes associated with that particular domain, till the maximum of two sub-levels.

After selecting the different items of his profile the user must finalize his choice or changes through the button Actualizar perfil in the right corner below of the page (red colour). After these steps the process for registering as new user is completed.

When a new program, containing news on the thematic domains and features defined on the user profile, enters in the system an email will be sent. The email contents will be described in the next section.

Reception of the system email

The email present for each news a set of fields. Starting by the program title, the date and length of the news. Following presents the title and a short summary of the news. Ends with the thematic domain, onomastic and geographic index, associated both to the news and the user profile. A link to the RealVideo file is supplied.

With this email the service associated to SSNT is complete.

Direct search

The Direct Search mode has a configuration similar to the user profile definition, acting only over the programs stored in the database and not over new programs. The definition of the news to retrieve is similar to the user profile definition. After selecting the button Pesquisar, on the right below corner of the page, a search of the news in the database and according to the user request is performed.

An example of results is presented in the next figure. First is presented the thematic domain followed by the several news that obey to that search criterion. For each news is showed the news title, the program name, emission date, news length and summary. Finally there is the indication of thematic domain, onomastic and geographic index associated to the news. If the user wish he may perform a new search. Again is provided a link to the RealVideo file associated with that particular news.

With the detailed explanation of the contents of the different pages we presented this new service.

Present limitations

This system presents a set of innovative features based on speech processing techniques and topic detection. However due to the development conditions we know that the system still have a set of limitations. Among them we highlight the following ones:

The speech recognition system is based on a limited vocabulary. Presently the system only have ability to recognize 58K different words. That means when there are new events establishing words out of vocabulary the system searches among the ones that are closer. This generates transcription errors with negative effects in the story segmentation and indexation.
Despite we are using different hierarchical levels in topics definition, not all the topics in a more deep level are perfectly trained due to the weak occurrence in the training process.
The title and summary do not have any processing involved. The title is only the first sentence of the news and the summary the first five sentences. This works reasonably good when the news is perfectly segmented and transcribed. In the future we want to make adequate processing to extract a title and a summary with a higher degree of correctness.
Since we are dealing with a new service is very important to have a very natural user interface. If you got any problems please send us an email with your suggestions.

Access to SSNT

After this detailed description you are in conditions to access to the system. We hope that you find in SSNT the necessary features to starting using this service. If you need any additional information or if you wish to draw any comment we are available here.

Access to the service SSNT - Summarization of Broadcast News Services

@@ Line 1: / Line 1: @@
 __NOTOC__
 This is the presentation of the SSNT service through a detailed description of its features. During this description you can discover several relevant aspects for a complete perception of this new service that we are offering. If you are not interested in such a detailed description and want just a quick view of the service we propose the first news of the last processed "Telejornal" news show:
+<!--
 {{online_demo|logo=demo-ssnt-logo.png
 | link=<php> include 'extensions/SSNT/LastTelej.php';
@@ Line 15: / Line 15: @@
 | contact=[mailto:info@l2f.inesc-id.pt L²F]
 }}
+-->
 If you are interested in a detailed description of the service we propose the following set of points: