ECESS (European Center of Excellence on Speech Synthesis)


The main goals of ECESS are:

  • Achieve the critical mass needed to push substantially progress in speech synthesis technology
  • Integrate basic research know how related to speech synthesis
  • Attract public and private funding

Achievement of Critical Mass

Many public and private institutions are active in the field of speech synthesis. As a matter of fact many institutions have not the critical mass to build up and maintain a TTS system containing all the state of the art advances. Most institutions focus their R&D on certain aspects important for a certain 'focus module' integrated in an own TTS development system, which contains some 'out of the focus' modules not containing state of the art advances. These 'out of focus modules' may dominate the final speech quality of the whole TTS system and advances in R&D on the 'focus modules' can be hardly tracked. For developing speech synthesis systems language resources (voices, lexica, corpora) are needed which have to be annotated with tools e.g. for automatic pitch marking, segmenting into phonetic units and APF¹ extractors. Due to restricted resources the needed language resources and tools may have not the quality needed. Finally useful functionalities of the speech synthesis modules as voice conversion are not in focus of research for each institution. Due to this situation progress in R&D for speech synthesis technology is slow.

In order to overcome this problem ECESS is building an infrastructure allowing for each institution to be active for a specific research task and to benefit from the activities of the other institutions. Thus ECESS acts as a large 'virtual' institute, which has the critical mass for fasten progress in speech synthesis. The basic elements of the infrastructure of ECESS are

  • Common system architecture based on well defined modules and interfaces,
  • Common set of specified language resources,
  • Common set of tools,
  • Common set of evaluation criteria defining the quality of modules, language resources and tools.

The infrastructure is supported by a suited organisation described below.

Integration of basic research

Basic research in the field of articulation and perception is needed to push speech synthesis technology. Current TTS systems use few know how from these fields. The reason lays in the fact that basic research and TTS system development is mostly done on different institutions with different focus. Due to the progress achieved on both sides it is time to use the synergy given by both sides. ECESS will build research clusters around speech synthesis tasks were suited results of basic research will be integrated and the research directions in basic research will be influenced to meet the needs of the speech synthesis technology.

Get funding

In order to strengthen the cultural heritage each country should push R&D invest for getting excellent speech synthesis systems for its language(s) for public R&D and commercial use. ECESS will provide the infrastructure to build such systems, what should attract national funding for building language specific systems with the requested language resources. Private funding could be attracted by the partners of ECESS whenever they licence their modules/language resources to third parties for research or commercial use.

¹Adaptive Personality Features (APFs) are defined that describe a person's expressiveness, as induced by the communicative intention, the current speaker state, the environmental condition, the relationship with his/her interlocutor(s)