Characterful Speech Synthesis

From HLT@INESC-ID

Matthew Aylett
Matthew Aylett
Dr Matthew Aylett graduated from Sussex University in 1987 in Artificial Intelligence with Computing. He subsequently worked in a wide variety of commercial positions including publishing, language teaching and computer support before returning to university at Edinburgh to take an MSc in Speech and Language Processing, graduating with a distinction in 1995. His PhD, Stochastic Suprasegmentals, focused on a computational approach to phonetics and prosody where he worked closely with the Edinburgh University Map Task, a dialogue corpus. He held a post doc within a speech recognition project at Edinburgh before being asked to join Rhetorical Systems, a university spin-out in speech synthesis, as a senior development engineer in 2000. He was responsible for the core search module of the Rhetorical Systems rVoice product and continued to publish internationally on the topic of speech synthesis. In 2005 he spent a 7 month sabbatical at the Iternational Computer Sciencer Institute (ICSI), Berkeley where he worked on sub lexical prosodic analysis of meeting dialogues. He set up CereProc in January 2006 with the objective of producing commercial characterful speech synthesis where he is currently CTO. He has maintained close links with Edinburgh's Center for Speech Technology research (CSTR) working on a number of research projects, carrying out PhD supervision, and actively publishing within the field.
Addresses: www mail

Date

  • 15:00, Friday, September 10th, 2010
  • Room 336

Speaker

  • Matthew Aylett, CereProc

Abstract

Speech synthesis is a key enabling technology for pervasive and mobile computing as well as a key requirement for accessability. Adding character to synthetic voices is a requirement for effective interaction and for devices that wish to present a coherant branded interface.

In this talk I will argue that current approaches to sythesis, and current commercial pressures, make it difficult for many systems to create characterful synthesis. We will present how CereProc's approach differs from the industry standard and how we have attempted to maintain and increase the characterfullness of CereVoice's output. (Online demo available at www.cereproc.com)

We will outline the expressive synthesis markup that is supported by the system, how these are expressed in underlying digital signal processing and selection tags. Finally we will present the concept of second pass synthesis where cues can be manually tweaked to allow direct control of intonation style, and where synthesis can be seamlessly mixed with pre-recorded prompts to produce extremely natural output.

We will also demonstrate how we can use synthesis to 'clone' celebrity voices with a brief demonstration of voices copied from George W. Bush (e.g http://www.idyacy.com/cgi-bin/bushomatic.cgi)

Time permiting I will also demonstrate some experiments looking at hybrid approaches to parametric/unit selection synthesis.


Note: This seminar will be held in English.