Resources: Difference between revisions

From HLT@INESC-ID

Line 20: Line 20:


== Lexica ==
== Lexica ==
Pronunciation lexica (besides the ones included in the above corpora documentation):
Pronunciation lexica (besides the ones included in the above corpora documentation). The pronunciation lexica developed by L²F used the SAMPA phonetic alphabet. See the [[SAMPA Table for European Portuguese|SAMPA table for European Portuguese]] and some comments about its design.
* '''ONOMASTICA''' (Proper names of 11 European languages, in cooperation with TLP - Telefones de Lisboa e Porto): ~ 100.000 names of people, streets, towns and companies
* '''ONOMASTICA''' (Proper names of 11 European languages, in cooperation with TLP - Telefones de Lisboa e Porto): ~ 100.000 names of people, streets, towns and companies
* '''PF''' (Português Fundamental): ~ 26.000 citation forms
* '''PF''' (Português Fundamental): ~ 26.000 citation forms
The pronunciation lexica developed by L²F use the SAMPA phonetic alphabet. See the [[SAMPA Table for European Portuguese|SAMPA table for European Portuguese]] and some comments about its design.


== See Also ==
== See Also ==

Revision as of 10:58, 26 December 2023

L²F has been particularly active in the creation of linguistic resources for European Portuguese. The cooperation with CLUL has been of paramount importance in this activity. The resources listed are in inverse chronological order. The corresponding webpages are in Portuguese.

Speech Corpora

  • POSTPORT - European, Brazilian and African varieties of Portuguese
  • LECTRA - Classroom lectures
  • IPSOM - Aligned spoken books
  • ALERT - Broadcast news
  • CORAL - Spoken dialogues (map task)
  • BD-PÚBLICO- Large vocabulary, speaker-independent, continuous speech
  • SPEECHDAT - Multi-purpose telephone speech database
  • BDFALA - Speech analysis / synthesis
  • EUROM.1 - Multi-Lingual speech corpus for phonetic comparison
  • VoxCeleb-PT - annotated corpus of European Portuguese celebrities.

Text Corpora

Lexica

Pronunciation lexica (besides the ones included in the above corpora documentation). The pronunciation lexica developed by L²F used the SAMPA phonetic alphabet. See the SAMPA table for European Portuguese and some comments about its design.

  • ONOMASTICA (Proper names of 11 European languages, in cooperation with TLP - Telefones de Lisboa e Porto): ~ 100.000 names of people, streets, towns and companies
  • PF (Português Fundamental): ~ 26.000 citation forms

See Also

Newspapers

Language Resource Centers

  • Linguateca (Distributed language resource center for Portuguese)
  • ELRA (European Language Resources Association)
  • LDC (Linguistic Data Consortium)

Dictionaries