Resources: Difference between revisions
From HLT@INESC-ID
No edit summary |
No edit summary |
||
Line 3: | Line 3: | ||
== Corpora == | == Corpora == | ||
=== Speech === | |||
* [[LECTRA Corpus|LECTRA]] - Classroom lectures | * [[LECTRA Corpus|LECTRA]] - Classroom lectures | ||
* [[IPSOM Pilot Corpus|IPSOM]] - Aligned spoken books | * [[IPSOM Pilot Corpus|IPSOM]] - Aligned spoken books | ||
Line 11: | Line 14: | ||
* [[BDFALA Corpus|BDFALA]] - Speech analysis / synthesis | * [[BDFALA Corpus|BDFALA]] - Speech analysis / synthesis | ||
* [[EUROM.1 Corpus|EUROM.1]] - Multi-Lingual speech corpus for phonetic comparison | * [[EUROM.1 Corpus|EUROM.1]] - Multi-Lingual speech corpus for phonetic comparison | ||
=== Translation === | |||
* [http://www.l2f.inesc-id.pt/resources/translation/golden_collection.zip Golden collection of parallel multi-language word alignments] | |||
== Lexica == | == Lexica == |
Revision as of 07:31, 24 June 2008
L²F has been particularly active in the creation of linguistic resources for European Portuguese. The cooperation with CLUL has been of paramount importance in this activity. The resources listed are in inverse chronological order. The corresponding webpages are in Portuguese.
Corpora
Speech
- LECTRA - Classroom lectures
- IPSOM - Aligned spoken books
- ALERT - Broadcast news
- CORAL - Spoken dialogues (map task)
- BD-PÚBLICO- Large vocabulary, speaker-independent, continuous speech
- SPEECHDAT - Multi-purpose telephone speech database
- BDFALA - Speech analysis / synthesis
- EUROM.1 - Multi-Lingual speech corpus for phonetic comparison
Translation
Lexica
Pronunciation lexica (besides the ones included in the above corpora documentation):
- ONOMASTICA (Proper names of 11 European languages, in cooperation with TLP - Telefones de Lisboa e Porto): ~ 100.000 names of people, streets, towns and companies
- PF (Português Fundamental): ~ 26.000 citation forms
The pronunciation lexica developed by L²F use the SAMPA phonetic alphabet. See the SAMPA table for European Portuguese and some comments about its design.
See Also
Newspapers
- List of Newspapers on the Internet produced by Isabel Trancoso and maintained jointly with IMS Stuttgart.
Language Resource Centers
- Linguateca (Distributed language resource center for Portuguese)
- ELRA (European Language Resources Association)
- LDC (Linguistic Data Consortium)