Resources

From HLT@INESC-ID

We have been active in the creation of linguistic resources for European Portuguese and for other languages. The cooperation with CLUL has been of paramount importance regarding some of the available resources.

Speech Corpora

  • ALERT - Broadcast news
  • BDFALA - Speech analysis / synthesis
  • BD-PÚBLICO- Large vocabulary, speaker-independent, continuous speech
  • CORAL - Spoken dialogues (map task)
  • EUROM.1 - Multi-Lingual speech corpus for phonetic comparison
  • IPSOM - Aligned spoken books
  • LECTRA - Classroom lectures
  • POSTPORT - European, Brazilian and African varieties of Portuguese
  • SPEECHDAT - Multi-purpose telephone speech database
  • VoxCeleb-PT - annotated corpus of European Portuguese celebrities.

Text Corpora

Pronunciation Lexica

The following pronunciation lexica use the SAMPA phonetic alphabet. See the SAMPA table for European Portuguese and some comments about its design.

  • ONOMASTICA (Proper names of 11 European languages, in cooperation with TLP - Telefones de Lisboa e Porto): ~ 100.000 names of people, streets, towns and companies
  • PF (Português Fundamental): ~ 26.000 citation forms

See Also