SSNT - Summarization of Broadcast News Services: Difference between revisions

From HLT@INESC-ID

No edit summary
No edit summary
Line 10: Line 10:
       echo ") in RealVideo format ";  
       echo ") in RealVideo format ";  
       echo "here</FONT></B></a>";
       echo "here</FONT></B></a>";
      echo ".$urlrstp.";
</php>
</php>



Revision as of 10:14, 3 July 2006

This is the presentation of the SSNT service through a detailed description of his features. During this description you can discover several relevant aspects for a complete perception of this new service that we are offering. If you are not interested in such a detailed description and want just a quick view of the service we propose two different alternatives:

  1. <php> include 'LastTelej.php';
     $lj = new LastTelejornal(); 
     $urlrstp = $lj->constroi_url();
     echo "<A HREF='".$urlrstp."'>See the first news of the last Telejornal (";
     $out_array = $lj->get();
     echo $out_array["emissiondate"];
     echo ") in RealVideo format "; 
     echo "here</a>";

</php>

  1. Direct access to the SSNT service web page

If you are interested in a detailed description of the service we propose the following set of points:

  1. Goals
  2. Support
  3. Functional diagram
  4. System description
  5. User interface
  6. Present limitations
  7. Access to SSNT

Goals

Nowadays there is a significant need to deal with large amounts of multimedia information. With this service we want to develop a selective dissemination of multimedia contents, mainly of TV broadcast news. The use of advanced techniques for the processing of BN programs, through a segmentation and categorization process, made possible the access to the contents of the programs based on an individual definition of the user profiles.

Through this new service we made available the 8 o'clock news program of the main channel of RTP (Telejornal). The users are able to define which thematic areas they are interested and after the automatic processing of the program they receive an email with the news that fit to the requested domains. Be one of them and start using now this new service.

Support

This system was initially developed in consortium in the scope of the European project ALERT between INESC ID Lisboa, 4VDO and RTP. The developments of the large vocabulary continuous speech recognition system have been supported by the project POSI/33846/PLP/2000 financed by FCT.

Functional diagram

In the next figure a functional diagram of the service is presented.

As we can observe from the functional diagram, the system analyses a generic multimedia document and based on the contents segment it in coherent blocks, through a video and/or audio segmentation.

When the document contains audio an automatic transcription is performed through a large vocabulary continuous speech recognition system. Based on the block segmentation and on the text inside each block, resulting from the transcription or because is only a text document, an automatic detection of topics is performed in each block, with the possibility of clustering together several blocks in homogeneous segments according to the topics contents.

With the multimedia document divided into segments, and a set of topics assigned to each segment, a search is performed on the user profiles requiring the topics from that segments and an alert message is generated for that users.

At the end of the process the multimedia document is loaded into a database where we keep the document segmentation and the appropriate categorization in topics.

System description

The development of the system was based on a three main blocks structure: the CAPTURE block, responsible for the capture of the monitoring defined programs, the PROCESSING block, responsible to generate the relevant markup information associated to each program, and the SERVICE block, responsible for the user interface and database management. The control of the overall process is based on a simple semaphore scheme.

In the CAPTURE block we have access to the list of programs to monitorize and the information about the beginning and ending time of the programs. This information is the input to a capture program that, through a direct access to a cable TV network, starts the program recording at the specified time. This capture program generates a file with MPEG-1 codified video and audio. When the recording process is finished, an MPEG-1 file was generated together with the signalling to start the next block.

In the PROCESSING block the audio stream, extracted from the MPEG file, is processed through successive stages for segmenting, transcribing and indexing. The resulting information is compiled in an XML file.

In the SERVICE block we deal with the user interface, implemented through a set of web pages, and databases for user profiles and programs. Each time a program is processed an XML is generated and the database is updated. The matching between the program information and the user profiles generates a list of alerts sent to the users through an email service.

User interface

The first page supplly a short description of the service with three different access points. It is possible to register as a new user (this new user could apply for all the features of the service), allows to access to the system to reconfigure a pre registered user (through the username and password authentication), or a direct search over the database of segmented and indexed multimedia documents. Following we will give details over these different accessing points.

Registering as a new user

When choosing in the first page the SSNT registo the user enter in a new page where he must fill some personal data. The user becomes knew to the system through his username and email address.

After fill the personal information the user must press the button Actualizar Alterações, in the right below corner of the page. At that point the user should leave the system for receiving an email from the system with the final confirmation. In that email the user may find a link to confirm the registering as a new user and to access to the user profile definition.

The user profile definition is based on the choice of thematic domains, onomastic index, geographic index or free text. The profile definition results from an AND operation in each of these fields. To make effective the profile definition the user must click over the Adicionar button. At that time appears in the window of profile definition, in the below part of the page, a new thematic domain restricted by the possible indication of onomastic and geographic index or free text. The user may define different domains clustered together through an OR operation and that will appear in different lines in the window of profile definition. The user may also select different thematic domains simultaneously through the selective choice with the mouse, and for a thematic domain define in a more specific way the themes associated with that particular domain, till the maximum of two sub-levels.

After selecting the different items of his profile the user must finalize his choice or changes through the button Actualizar perfil in the right corner below of the page (red colour). After these steps the process for registering as new user is completed.

When a new program, containing news on the thematic domains and features defined on the user profile, enters in the system an email will be sent. The email contents will be described in the next section.

Reception of the system email

The email present for each news a set of fields. Starting by the program title, the date and length of the news. Following presents the title and a short summary of the news. Ends with the thematic domain, onomastic and geographic index, associated both to the news and the user profile. A link to the RealVideo file is supplied.

With this email the service associated to SSNT is complete.

Direct search

The Direct Search mode has a configuration similar to the user profile definition, acting only over the programs stored in the database and not over new programs. The definition of the news to retrieve is similar to the user profile definition. After selecting the button Pesquisar, on the right below corner of the page, a search of the news in the database and according to the user request is performed.

An example of results is presented in the next figure. First is presented the thematic domain followed by the several news that obey to that search criterion. For each news is showed the news title, the program name, emission date, news length and summary. Finally there is the indication of thematic domain, onomastic and geographic index associated to the news. If the user wish he may perform a new search. Again is provided a link to the RealVideo file associated with that particular news.

With the detailed explanation of the contents of the different pages we presented this new service.

Present limitations

This system presents a set of innovative features based on speech processing techniques and topic detection. However due to the development conditions we know that the system still have a set of limitations. Among them we highlight the following ones:

  • The speech recognition system is based on a limited vocabulary. Presently the system only have ability to recognize 58K different words. That means when there are new events establishing words out of vocabulary the system searches among the ones that are closer. This generates transcription errors with negative effects in the story segmentation and indexation.
  • Despite we are using different hierarchical levels in topics definition, not all the topics in a more deep level are perfectly trained due to the weak occurrence in the training process.
  • The title and summary do not have any processing involved. The title is only the first sentence of the news and the summary the first five sentences. This works reasonably good when the news is perfectly segmented and transcribed. In the future we want to make adequate processing to extract a title and a summary with a higher degree of correctness.
  • Since we are dealing with a new service is very important to have a very natural user interface. If you got any problems please send us an email with your suggestions.

Access to SSNT

After this detailed description you are in conditions to access to the system. We hope that you find in SSNT the necessary features to starting using this service. If you need any additional information or if you wish to draw any comment we are available here.

Access to the service
SSNT - Summarization of Broadcast News Services