Modeling Accent Groups for Improved Prosody in TTS
From HLT@INESC-ID
Date
- 15:00, Friday, February 8th, 2013
- Room 336
Speaker
Abstract
In this talk, I will present an ‘Accent Group’ based intonation model for statistical parametric speech synthesis. We propose an approach to automatically model phonetic realizations of fundamental frequency (F0) contours as a sequence of intonational events anchored to a group of syllables (an Accent Group). We train an accent grouping model specific to that of the speaker, using a stochastic context free grammar and contextual decision trees on the syllables. This model is used to ‘parse’ an unseen text into its constituent accent groups over each of which appropriate intonation is predicted. The performance of the model is shown objectively and subjectively on a variety of prosodically diverse tasks - read speech, news broadcast and audio books.