The goal of the European project VIDIVIDEO is to boost the performance of video search engines by forming a 1000 element thesaurus detecting instances of audio, video or mixed-media content. The project applies machine learning techniques to learn many different detectors from examples, using one-against-all classifiers.
This talk will start by a brief overview of the different audio-related tasks: audio segmentation, audio event detection (AED) and speech recognition (Isabel Trancoso). The current work on gender segmentation (male/female/child) and music segmentation will be very briefly mentioned (Rui Martins, António Serralheiro).
The remaining of the talk will focus on the second task (AED), starting with a presentation of the audio concepts in the VIDIVIDEO ontology (Miguel Bugalho) and the training and test corpora. The machine learning approaches we have followed and the corresponding results will be presented next (José Portêlo). We will then present some recent results with hierarchical clustering of audio events (Thomas Pellegrini).
This work involved the development of new tools that can be of interest not only for AED, but for speech processing in general. Alberto Abad will describe some of these tools, which have recently been integrated in the Audimus framework.