Speech Synthesis: past, present and future and how it mirrors speech processing development in general

From HLT@INESC-ID

Alan W Black
Alan W Black
Alan W Black is an Associate Professor in the Language Technologies Institute at Carnegie Mellon University. He previously worked in the Centre for Speech Technology Research at the University of Edinburgh, and before that at ATR in Japan. He is one of the principal authors of the free software Festival Speech Synthesis System, the FestVox voice building tools and CMU Flite, a small footprint speech synthesis engine. He received his PhD in Computational Linguistics from Edinburgh University in 1993, his MSc in Knowledge Based Systems also from Edinburgh in 1986, and a BSc (Hons) in Computer Science from Coventry University in 1984.

Although much of his core research focuses on speech synthesis, he also works in real-time hands-free speech-to-speech translation systems (Croatian, Arabic and Thai), spoken dialog systems, and rapid language adaptation for support of new languages. Alan W Black was an elected member of the IEEE Speech Technical Committee (2003-2007). He is currently on the board of ISCA and on the editorial board of Speech Communications. He was program chair of the ISCA Speech Synthesis Workshop 2004, and was general co-chair of Interspeech 2006 -- ICSLP. In 2004, with Prof Keiichi Tokuda, he initiated the now annual Blizzard Challenge, the largest multi-site evaluation of corpus-based speech synthesis techniques.

Addresses: www mail

Date

  • 15:00, Friday, June 5th, 2009
  • Room VA4, building of Civil Engineering

Speaker

  • Alan W Black, Carnegie Mellon University, USA

Abstract

This talk will look at the past, present and future of speech synthesis and how it relates to speech processing development in general. Specifically I will outline the advances in synthesis technology giving analogies to the developments in other speech and language processing fields (e.g. ASR and SMT) where knowledge-based techniques gave way to data-driven techniques, which in turn have pushed both machine learning technologies and later re-introduced techniques to include higher level knowledge in our data-driven approaches.

We will give overviews of diphone, unit selection, statistical parametric synthesis, voice morphing technologies and how synthesis can be optimized for the desired task. We will also address issues of evaluation, both in isolation and when embedded in real tasks. While widening our view of speech processing we will also present the publicly used Let's Go Spoken Dialog System (and its evaluation platform Let's Go Lab), our rapid language adaptation system (CMUSPICE) allowing construction of ASR and TTS support in new languages by non-speech experts and out hands-free real-time two-way speech to speech translation system showing how system integration can cause cross technology innovation.