Corpus-based Speech Synthesis for Any Voice: Difference between revisions

From HLT@INESC-ID

No edit summary
No edit summary
Line 8: Line 8:
== The Demo ==
== The Demo ==


<php>
$gL2F_voices_location = "/afs/l2f/home/spaulo/web_pages/demos/tts_demo";
function da_nova_page($oURL, $text, $voices) {
  global $gL2F_voices_location;
  $fpN = fopen("$gL2F_voices_location/voices/$voices", "r");
  $VoiceNames      = array();
  $VoiceNamesIndex = array();
  $dimVoiceIndex  = 0;
  while (!feof($fpN)) {
    $buffer=fgets($fpN, 4096);
    list($voiceN,$voiceDesc)=split(" ", $buffer);
    $newVoiceDesc=preg_replace(array(0 => '/\n/', 1 => '/_/'),
                              array(0 => '',    1 => ' '),
                              $voiceDesc);
    if($newVoiceDesc!='') {
      $VoiceNamesIndex[$dimVoiceIndex]=$voiceN;
      $VoiceNames[$voiceN]=$newVoiceDesc;
      $dimVoiceIndex++;
    }
  }
  fclose($fpN);
  //----------------------------------------------------
  echo "<H2>DIXI TTS</H2>";
  echo "<form method='post' name='intro' accept-charset='ISO-8859-1'>";
  echo "Seleccione uma voz: <select name='voice'>";
  echo "<option value='$voices'>" . $VoiceNames[$voices];
  for($i=0;$i<$dimVoiceIndex;$i++) {
    if ($VoiceNamesIndex[$i]!=$voices) {
      echo "<option value='" . $VoiceNamesIndex[$i] . "'>";
      echo $VoiceNames[$VoiceNamesIndex[$i]];
    }
  }
  echo "</select>";
  echo "<br/><br/>Escreva aqui o texto a sintetizar<br/>";
  echo "<textarea name='parasinteste' rows='10' cols='80'>$text</textarea>";
  echo "<br/>";
  echo "<input type='submit' value='Sintetiza'/>";
  echo "<a href='$oURL'>Ouvir novamente</a>";
  //----------------------------------------------------
  echo "</form>";
}
</php>


== Contact ==
== Contact ==

Revision as of 14:17, 10 July 2007

Have you ever thought of having a famous person talking to you at your PC?

Corpus-based speech synthesis is suitable for this purpose, since it uses large amounts of recorded speech from a single speaker. While highly natural synthetic signals are mostly produced when the user-supplied texts give rise to utterances that are very similar to the recorded ones, appropriate choses of context-defining features can make it possible to render equally natural signals even in completely "out-of-domain" sentences. Besides, the availability of large multi-media repositories (movies, broadcast news, etc...) allows us to get the large single speaker speech databases needed to build such TTS voices.

The Demo

Contact