Unsupervised semantic structure discovery for audio - Revision history

Imt at 16:00, 3 February 2014

2014-02-03T16:00:51Z

← Older revision		Revision as of 16:00, 3 February 2014
Line 9:		Line 9:
	== Date ==		== Date ==

	* 14:00, October 31<sup>st</sup>, 2013		* 13:00, October 31<sup>st</sup>, 2013
	* Room 020, INESC-ID		* Room 020, INESC-ID

Line 19:		Line 19:

	Automatic deduction of semantic event sequences from multimedia requires awareness of context, which in turn requires processing sequences of audiovisual scenes. Most non-speech audio databases, however, are not labeled at a sub-file level, and obtaining (acoustic or semantic) annotations for sub-file sound segments is likely to be expensive.		Automatic deduction of semantic event sequences from multimedia requires awareness of context, which in turn requires processing sequences of audiovisual scenes. Most non-speech audio databases, however, are not labeled at a sub-file level, and obtaining (acoustic or semantic) annotations for sub-file sound segments is likely to be expensive.
	In our work, we introduce a novel latent hierarchical structure that attempts to leverage weakly or ~~un labeled~~ data to process the observed acoustics to infer semantic import at various levels. The higher layers in the hierarchical structure of our model represent increasingly higher level semantics.		In our work, we introduce a novel latent hierarchical structure that attempts to leverage weakly or unlabeled data to process the observed acoustics to infer semantic import at various levels. The higher layers in the hierarchical structure of our model represent increasingly higher level semantics.

			[[category:Seminars]]
			[[category:Seminars_2012]]

Imt at 15:59, 3 February 2014

2014-02-03T15:59:27Z

← Older revision		Revision as of 15:59, 3 February 2014
Line 1:		Line 1:
			__NOTOC__
			{{speakerLargeBio\|
			\|name= Bhiksha Raj
			\|image=bhiksha.png
			\|email=bhiksha@cs.cmu.edu
			\|www=http://www.cs.cmu.edu/~bhiksha/
			\|bio= Bhiksha Raj is an Associate Professor in the Language Technologies Institute of the School of Computer Science at Carnegie Mellon University, with additional affiliations to the Electrical and Computer Engineering and Machine Learning departments. Dr. Raj obtained his PhD from CMU in 2000 and was at Mistubishi Electric Research Laboratories from 2001-2008. Dr. Raj's chief research interests lie in automatic speech recognition, computer audition, machine learning and data privacy. Dr. Raj's latest research interests lie in the newly emerging field of privacy-preserving speech processing, in which his research group has made several contributions. }}

			== Date ==

			* 14:00, October 31<sup>st</sup>, 2013
			* Room 020, INESC-ID

			== Speaker ==

			* Bhiksha Raj, Carnegie Mellon University

			== Abstract ==

	Automatic deduction of semantic event sequences from multimedia requires awareness of context, which in turn requires processing sequences of audiovisual scenes. Most non-speech audio databases, however, are not labeled at a sub-file level, and obtaining (acoustic or semantic) annotations for sub-file sound segments is likely to be expensive.		Automatic deduction of semantic event sequences from multimedia requires awareness of context, which in turn requires processing sequences of audiovisual scenes. Most non-speech audio databases, however, are not labeled at a sub-file level, and obtaining (acoustic or semantic) annotations for sub-file sound segments is likely to be expensive.
	In our work, we introduce a novel latent hierarchical structure that attempts to leverage weakly or un labeled data to process the observed acoustics to infer semantic import at various levels. The higher layers in the hierarchical structure of our model represent increasingly higher level semantics.		In our work, we introduce a novel latent hierarchical structure that attempts to leverage weakly or un labeled data to process the observed acoustics to infer semantic import at various levels. The higher layers in the hierarchical structure of our model represent increasingly higher level semantics.

Imt: Created page with "Automatic deduction of semantic event sequences from multimedia requires awareness of context, which in turn requires processing sequences of audiovisual scenes. Most non-speech ..."

2014-02-03T15:55:43Z

Created page with "Automatic deduction of semantic event sequences from multimedia requires awareness of context, which in turn requires processing sequences of audiovisual scenes. Most non-speech ..."

New page

Automatic deduction of semantic event sequences from multimedia requires awareness of context, which in turn requires processing sequences of audiovisual scenes. Most non-speech audio databases, however, are not labeled at a sub-file level, and obtaining (acoustic or semantic) annotations for sub-file sound segments is likely to be expensive.
In our work, we introduce a novel latent hierarchical structure that attempts to leverage weakly or un labeled data to process the observed acoustics to infer semantic import at various levels. The higher layers in the hierarchical structure of our model represent increasingly higher level semantics.