A Portuguese Text-to-Speech Synthesizer For Alternative and Augmentative Communication
INESC - Instituto de Engenharia de Sistemas e Computadores
CLUL - Centro de Linguística da Universidade de Lisboa


The results of the EDIFALA project demonstrated the importance of speech synthesis technology in the scope of rehabilitation technologies. Although it was primarily targeted for children with cerebral palsy, the research team has been contacted by handicapped people and therapists who wanted to use the system for other disabilities.

The main goal of this project is the development of our speech synthesis system in order to improve its quality and to be usable in a wider range of augmentative and alternative communication (AAC) systems.

The DIXI system is a text-to-speech synthesizer for European Portuguese developed in the scope of the cooperation between the Speech Processing Group of INESC and the Phonetic and Phonology Group of CLUL. The current version is a synthesis-by-rule formant synthesizer, using Klatt's model and has a multi-linear linguistic rule model.

This project includes five major tasks: the restructuring of the system architecture, the addition of a concatenative-based waveform synthesizer, the development of the prosody module, the inclusion of a standardized application programming interface and the assessment of the resulting system.

The restructuring of the system architecture will adopt the Festival system philosophy: a multi-lingual framework for text-to-speech synthesizers, that provides a common interface to all the system modules. This way, a module can be added, replaced or shared easily and efficiently.

The addition of the concatenative-base waveform synthesizer module has two goals: in one hand it will allow us to study the specific problems of this synthesis method for European Portuguese and in the other hand it will make easier the addition of the different voices required by most accessibility aid applications.

The current version of the DIXI system uses a very limited set of intonation patterns. In this project we will extend the prosody modules to produce more complex patterns. A part-of-speech tagger will also be included in the system that will allow a more detailed phrase breaking.

One of the major difficulties in the integration of the text-to-speech systems with AAC applications was the lack of a standardized program interface, mainly for software-only synthesizers. This has recently changed and there are now industry standards for speech application programming interfaces. The development of such an interface for the European Portuguese synthesizer will allow its use by a wide range of applications, namely screen readers.

The final task of the project is the evaluation of the system and it will include two sub-tasks: the assessment of the synthesized speech and the evaluation of the use of the system in AAC solutions. In the first part, a panel of human listeners will perform a formal evaluation of intelligibility and naturalness of the output speech. The second part will focus on the usage of the system in an AAC solution for a speech impaired person and another solution for a blind person. In this evaluation phase the system will be installed in portable computers that will be used by the evaluators in their daily live.

Luís Caldas de Oliveira