|Bhiksha Raj is an Associate Professor in the Language Technologies Institute of the School of Computer Science at Carnegie Mellon University, with additional affiliations to the Electrical and Computer Engineering and Machine Learning departments. Dr. Raj obtained his PhD from CMU in 2000 and was at Mistubishi Electric Research Laboratories from 2001-2008. Dr. Raj's chief research interests lie in automatic speech recognition, computer audition, machine learning and data privacy. Dr. Raj's latest research interests lie in the newly emerging field of privacy-preserving speech processing, in which his research group has made several contributions.
|Addresses: www mail
- 13:00, October 31st, 2013
- Room 020, INESC-ID
- Bhiksha Raj, Carnegie Mellon University
Automatic deduction of semantic event sequences from multimedia requires awareness of context, which in turn requires processing sequences of audiovisual scenes. Most non-speech audio databases, however, are not labeled at a sub-file level, and obtaining (acoustic or semantic) annotations for sub-file sound segments is likely to be expensive.
In our work, we introduce a novel latent hierarchical structure that attempts to leverage weakly or unlabeled data to process the observed acoustics to infer semantic import at various levels. The higher layers in the hierarchical structure of our model represent increasingly higher level semantics.