Speech Synthesis: past, present and future and how it mirrors speech processing development in general: Difference between revisions
From HLT@INESC-ID
No edit summary |
(→Date) |
||
Line 12: | Line 12: | ||
* 15:00, Friday, June 5<sup>th</sup>, 2009 | * 15:00, Friday, June 5<sup>th</sup>, 2009 | ||
* Room | * Room VA4, building of Civil Engineering | ||
== Speaker == | == Speaker == |
Latest revision as of 11:15, 5 June 2009
Alan W Black |
![]() Although much of his core research focuses on speech synthesis, he also works in real-time hands-free speech-to-speech translation systems (Croatian, Arabic and Thai), spoken dialog systems, and rapid language adaptation for support of new languages. Alan W Black was an elected member of the IEEE Speech Technical Committee (2003-2007). He is currently on the board of ISCA and on the editorial board of Speech Communications. He was program chair of the ISCA Speech Synthesis Workshop 2004, and was general co-chair of Interspeech 2006 -- ICSLP. In 2004, with Prof Keiichi Tokuda, he initiated the now annual Blizzard Challenge, the largest multi-site evaluation of corpus-based speech synthesis techniques. |
Addresses: www mail |
Date
- 15:00, Friday, June 5th, 2009
- Room VA4, building of Civil Engineering
Speaker
- Alan W Black, Carnegie Mellon University, USA
Abstract
This talk will look at the past, present and future of speech synthesis and how it relates to speech processing development in general. Specifically I will outline the advances in synthesis technology giving analogies to the developments in other speech and language processing fields (e.g. ASR and SMT) where knowledge-based techniques gave way to data-driven techniques, which in turn have pushed both machine learning technologies and later re-introduced techniques to include higher level knowledge in our data-driven approaches.
We will give overviews of diphone, unit selection, statistical parametric synthesis, voice morphing technologies and how synthesis can be optimized for the desired task. We will also address issues of evaluation, both in isolation and when embedded in real tasks. While widening our view of speech processing we will also present the publicly used Let's Go Spoken Dialog System (and its evaluation platform Let's Go Lab), our rapid language adaptation system (CMUSPICE) allowing construction of ASR and TTS support in new languages by non-speech experts and out hands-free real-time two-way speech to speech translation system showing how system integration can cause cross technology innovation.