Difference between revisions of "PoSTPort (POrting Speech Technologies to other varieties of Portuguese)"

From HLT@INESC-ID

Line 1: Line 1:
 
{| align="center"
 
{| align="center"
|[[Image:Lectra cplp.jpg|206px]]
+
|[[Image:POSTPORT cplp.jpg|206px]]
 
|}
 
|}
  

Revision as of 12:05, 23 November 2007

POSTPORT cplp.jpg

Sponsored by: FCT (PTDC/PLP/72404/2006)
Start: January 2008
End: December 2010

Team

Project Leader: Isabel Trancoso


Undergraduate Students:

Summary

The goal of this project is porting spoken language technologies originally developed for European Portuguese to other varieties of Portuguese, namely those spoken in South-American and African countries. The two main spoken language technologies that will be investigated are text-to-speech and speech-to-text conversion, commonly known as synthesis and recognition. Instead of porting complete systems, we shall concentrate on the linguistically relevant modules, that are likely to be most affected. The main task of this proposal, therefore, is structured in terms of modules which will be ported and evaluated independently as much as possible. Prior to this main work, however, the project will involve two tasks. In the first one, we shall build the pilot corpus necessary for studying the main linguistic differences and porting to the new varieties. In the second task, we shall characterize the main differences between the studied varieties, starting with the orthographic and syntactic levels, and ending with the phonetic and phonological levels. As an outcome of this task, we shall attempt to define a phonetic alphabet that encompasses all varieties of Portuguese. The fourth task concerns the automatic identification of spoken varieties of Portuguese. This is an important goal by itself and also relevant as a pre-processing stage for switching among recognition systems developed for specific varieties. We plan to implement this identification module by exploring different types of cues, such as phonotactic, acoustic and prosodic ones. The four tasks of the proposal combine linguistic, signal processing and natural language processing knowledge, thus reflecting the importance of an interdisciplinary team.

Workplan

  • T1 - Pilot Corpus collection
  • T2 - Characterization of main differences
  • T3 - Porting spoken language modules
  • T4 - Automatic identification of spoken varieties
  • T5 - Management

Corpus

Publications

Demos