Corpus-based Speech Synthesis for Any Voice: Difference between revisions
From HLT@INESC-ID
No edit summary |
No edit summary |
||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[Image:FamousPeople.gif|right]] | [[Image:FamousPeople.gif|right]] | ||
Have you ever thought of having a famous person talking to you at your PC? | '''Have you ever thought of having a famous person talking to you at your PC?''' | ||
Corpus-based speech synthesis is suitable for this purpose, since it uses large amounts of recorded speech from a single speaker. While highly natural synthetic signals are mostly produced when the user-supplied texts give rise to utterances that are very similar to the recorded ones, appropriate choses of context-defining features can make it possible to render equally natural signals even in completely "out-of-domain" sentences. Besides, the availability of large multi-media repositories (movies, broadcast news, etc...) allows us to get the large single speaker speech databases needed to build such TTS voices. | Corpus-based speech synthesis is suitable for this purpose, since it uses large amounts of recorded speech from a single speaker. While highly natural synthetic signals are mostly produced when the user-supplied texts give rise to utterances that are very similar to the recorded ones, appropriate choses of context-defining features can make it possible to render equally natural signals even in completely "out-of-domain" sentences. Besides, the availability of large multi-media repositories (movies, broadcast news, etc...) allows us to get the large single speaker speech databases needed to build such TTS voices. | ||
* | == The Demo == | ||
* Go to [https://www.l2f.inesc-id.pt/~spaulo/voices/tts_demo demo page] | |||
== Contact == | |||
* [[Sérgio Paulo]] | |||
[[category:Demos]] | [[category:Demos]] |
Latest revision as of 14:17, 10 July 2007
Have you ever thought of having a famous person talking to you at your PC?
Corpus-based speech synthesis is suitable for this purpose, since it uses large amounts of recorded speech from a single speaker. While highly natural synthetic signals are mostly produced when the user-supplied texts give rise to utterances that are very similar to the recorded ones, appropriate choses of context-defining features can make it possible to render equally natural signals even in completely "out-of-domain" sentences. Besides, the availability of large multi-media repositories (movies, broadcast news, etc...) allows us to get the large single speaker speech databases needed to build such TTS voices.
The Demo
- Go to demo page