Characterful Speech Synthesis
From HLT@INESC-ID
Matthew Aylett |
![]() |
Addresses: www mail |
Date
- 15:00, Friday, September 10th, 2010
- Room 336
Speaker
- Matthew Aylett, CTO CereProc
Abstract
Speech synthesis is a key enabling technology for pervasive and mobile computing as well as a key requirement for accessability. Adding character to synthetic voices is a requirement for effective interaction and for devices that wish to present a coherant branded interface.
In this talk I will argue that current approaches to sythesis, and current commercial pressures, make it difficult for many systems to create characterful synthesis. We will present how CereProc's approach differs from the industry standard and how we have attempted to maintain and increase the characterfullness of CereVoice's output. (Online demo available at www.cereproc.com)
We will outline the expressive synthesis markup that is supported by the system, how these are expressed in underlying digital signal processing and selection tags. Finally we will present the concept of second pass synthesis where cues can be manually tweaked to allow direct control of intonation style, and where synthesis can be seamlessly mixed with pre-recorded prompts to produce extremely natural output.
We will also demonstrate how we can use synthesis to 'clone' celebrity voices with a brief demonstration of voices copied from George W. Bush (e.g http://www.idyacy.com/cgi-bin/bushomatic.cgi)
Time permiting I will also demonstrate some experiments looking at hybrid approaches to parametric/unit selection synthesis.
Note: This seminar will be held in English.