Speech Synthesis


Revision as of 13:05, 29 June 2006 by Imt (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Most current high-quality speech synthesizers have only a few voices for each language. This is mostly due to the cost of building a speech inventory for a new voice, which requires a professional speaker that articulates each utterance in a predictable way and manual tuning of the phonetic segmentation of the recordings. At L2F we have been focusing our efforts in the development of tools to automate the process of creating new voices for a speech synthesizer. The precision of automatic phonetic alignment has been achieved by combining both HMM and DTW techniques [Paulo 2003, 2004] and speaker’s regional dialect and disfluencies have also been taken into account [Paulo 2005]. The resulting voices have been integrated into synthesizers for both limited and unlimited domain applications. We have been working with the Festival Speech Synthesis System, a free software synthesis toolkit and engine developed at the University of Edinburgh and using CMU’s FestVox tools for building new voices and Flite, a small footprint synthesis engine.

The synthesizers developed at L2F have been integrated in a variety of applications namely a dialogue system for home automation [Neto 2003, 2004] and to provide speech output for synthetic characters [Cabral 2006a]. These applications require the ability to modify not only the rhythm and intonation of the synthesized speech, as produced by standard speech synthesizers, but they also require the ability to perform voice quality transformations to produce more expressive speech [Cabral 2005, 2006b].

Current research in this area at L2F also includes the following topics:

  • Voice transformation
The need for producing customized voices with a minimal amount of recordings of the target speaker requires further developments on voice transformation techniques. Voice customization is important not only for speech-to-speech translation systems but also to provide specialized voices for synthetic characters.
  • Festival, Festvox and Flite
The use of common tools for speech synthesis research has been an important factor for sharing the progress of this area among different research groups. INESC-ID has already provided financial support for the development of Flite and is willing to continue to contribute to the improvement of these tools.