Modeling Accent Groups for Improved Prosody in TTS

From HLT@INESC-ID

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Gopala Krishna Anumanchipalli
Gopala Krishna Anumanchipalli

Date

  • 15:00, Friday, February 8th, 2013
  • Room 336

Speaker

Abstract

In this talk, I will present an ‘Accent Group’ based intonation model for statistical parametric speech synthesis. We propose an approach to automatically model phonetic realizations of fundamental frequency (F0) contours as a sequence of intonational events anchored to a group of syllables (an Accent Group). We train an accent grouping model specific to that of the speaker, using a stochastic context free grammar and contextual decision trees on the syllables. This model is used to ‘parse’ an unseen text into its constituent accent groups over each of which appropriate intonation is predicted. The performance of the model is shown objectively and subjectively on a variety of prosodically diverse tasks - read speech, news broadcast and audio books.