Unsupervised semantic structure discovery for audio
From HLT@INESC-ID
Bhiksha Raj |
![]() |
Addresses: www mail |
Date
- 13:00, October 31st, 2013
- Room 020, INESC-ID
Speaker
- Bhiksha Raj, Carnegie Mellon University
Abstract
Automatic deduction of semantic event sequences from multimedia requires awareness of context, which in turn requires processing sequences of audiovisual scenes. Most non-speech audio databases, however, are not labeled at a sub-file level, and obtaining (acoustic or semantic) annotations for sub-file sound segments is likely to be expensive. In our work, we introduce a novel latent hierarchical structure that attempts to leverage weakly or unlabeled data to process the observed acoustics to infer semantic import at various levels. The higher layers in the hierarchical structure of our model represent increasingly higher level semantics.