EmoVoice: Transformation of Speech Emotions

From HLT@INESC-ID

Revision as of 21:26, 5 July 2007 by David (talk | contribs)

Utterance

It is possible either to select an utterance from our speech database or to upload any utterance. The upload file has to be a WAV file (".wav" extension). The avaiable files for selection were obtained from the arctic database. We have already performed the computations for these files so that the results will be outputed faster than if you upload a speech file.

Speech Parameters Computations

In this section it is possible to obtain text files for the pitchmarks, the pitch contour or the waves transcription for a given speech file. Also, the residual signal can be computed from the speech signal for download.

Pitchmarks

Pitchmarks correspond to the instants of glottal closure in a laryngograph waveform (see Figure 1). We use the pitchmark detector from the Entropic's (ESPS) tools. Our techniques for speech transformations are pitch-synchronous thus they are dependent on the robustness of the pitch marking algorithm. The file with the computed pitchmarks has two columns. The first presents the time instants of the pitch marks and the second has the same length as the first and is filled with the character "1" (it means the pitchmarks correspond to voiced regions only). It is possible to manually correct the pitchmarks from the downloaded file and upload the new file in sections 2 and 3 for speech transformations. To correct the pitchmarks you can compute the residual signal and open it together with the pitchmarks transcription with an appropriate software such as the WaveSurver.

Pitch contour

Pitch contour is predicted from the pitchmarks. F0 values are estimated as the time interval between sucessive pitchmarks in the voiced regions. Thus, the number of F0 points is equal to the number of voiced pitchmarks. In the output file the first column presents the time instants and the second column presents the correspondent F0 values. You can modify the computed pitch conotour and use it as the target pitch contour to transform the pitch of the speech signal. There are appropriate tools to open a speech file together with the pitch contour which permits to easily modify the pitch contour in section 2. For example, you can use the WaveSurver or the Praat software.

Waves transcription

The speech transcription is in the Waves format. The first column has information about the pitchmarks instants wether the second column has the labels for voicing classification (see Table 1). Pitchmarks and voiced/unvoiced classification were predicted with the Entropic's (ESPS) tools. For the silence classification we used as speech features the zero counting and the energy. To open the wav file and the pitchmarks transcripton you can use, for example, the WaveSurver software.