Corpus-based Speech Synthesis for Any Voice: Difference between revisions

From HLT@INESC-ID

No edit summary
No edit summary
Line 1: Line 1:
[[Image:FamousPeople.gif|right]]
[[Image:FamousPeople.gif|right]]
Have you ever thought of having a famous person talking to you at your PC?
'''Have you ever thought of having a famous person talking to you at your PC?'''


Corpus-based speech synthesis is suitable for this purpose, since it uses large amounts of recorded speech from a single speaker. While highly natural synthetic signals are mostly produced when the user-supplied texts give rise to utterances that are very similar to the recorded ones, appropriate choses of context-defining features can make it possible to render equally natural signals even in completely "out-of-domain" sentences. Besides, the availability of large multi-media repositories (movies, broadcast news, etc...) allows us to get the large single speaker speech databases needed to build such TTS voices.  
Corpus-based speech synthesis is suitable for this purpose, since it uses large amounts of recorded speech from a single speaker. While highly natural synthetic signals are mostly produced when the user-supplied texts give rise to utterances that are very similar to the recorded ones, appropriate choses of context-defining features can make it possible to render equally natural signals even in completely "out-of-domain" sentences. Besides, the availability of large multi-media repositories (movies, broadcast news, etc...) allows us to get the large single speaker speech databases needed to build such TTS voices.  

Revision as of 21:22, 5 July 2007

Have you ever thought of having a famous person talking to you at your PC?

Corpus-based speech synthesis is suitable for this purpose, since it uses large amounts of recorded speech from a single speaker. While highly natural synthetic signals are mostly produced when the user-supplied texts give rise to utterances that are very similar to the recorded ones, appropriate choses of context-defining features can make it possible to render equally natural signals even in completely "out-of-domain" sentences. Besides, the availability of large multi-media repositories (movies, broadcast news, etc...) allows us to get the large single speaker speech databases needed to build such TTS voices.