(→Speech) |
|||
(18 intermediate revisions by 4 users not shown) | |||
Line 3: | Line 3: | ||
== Corpora == | == Corpora == | ||
− | * [[LECTRA Corpus|LECTRA]] | + | |
− | * [[IPSOM Pilot Corpus|IPSOM]] - | + | === Speech === |
− | * [[ALERT Corpus|ALERT]] | + | |
− | * [[CORAL Corpus|CORAL]] | + | * POSTPORT - European, Brazilian and African varieties of Portuguese |
− | * [[BD-PÚBLICO Corpus|BD-PÚBLICO]]- | + | * [[LECTRA Corpus|LECTRA]] - Classroom lectures |
− | * [[SPEECHDAT Corpus|SPEECHDAT]] - | + | * [[IPSOM Pilot Corpus|IPSOM]] - Aligned spoken books |
− | * [[BDFALA Corpus|BDFALA]] | + | * [[ALERT Corpus|ALERT]] - Broadcast news |
− | * [[EUROM.1 Corpus|EUROM.1]] | + | * [[CORAL Corpus|CORAL]] - Spoken dialogues (map task) |
+ | * [[BD-PÚBLICO Corpus|BD-PÚBLICO]]- Large vocabulary, speaker-independent, continuous speech | ||
+ | * [[SPEECHDAT Corpus|SPEECHDAT]] - Multi-purpose telephone speech database | ||
+ | * [[BDFALA Corpus|BDFALA]] - Speech analysis / synthesis | ||
+ | * [[EUROM.1 Corpus|EUROM.1]] - Multi-Lingual speech corpus for phonetic comparison | ||
+ | |||
+ | === Bilingual Corpus === | ||
+ | |||
+ | * [[Word_Alignments|Golden collection of parallel multi-language word alignments]] - Manually annotated word alignments between six european languages taken from the Europarl common test set <br>(more information on the [[Speech-to-speech Translation]] information page) | ||
== Lexica == | == Lexica == | ||
Line 28: | Line 36: | ||
=== Language Resource Centers === | === Language Resource Centers === | ||
* [http://www.linguateca.pt Linguateca] (Distributed language resource center for Portuguese) | * [http://www.linguateca.pt Linguateca] (Distributed language resource center for Portuguese) | ||
− | * [http://www. | + | * [http://www.elra.info ELRA] (European Language Resources Association) |
* [http://morph.ldc.upenn.edu/ LDC] (Linguistic Data Consortium) | * [http://morph.ldc.upenn.edu/ LDC] (Linguistic Data Consortium) | ||
L²F has been particularly active in the creation of linguistic resources for European Portuguese. The cooperation with CLUL has been of paramount importance in this activity. The resources listed are in inverse chronological order. The corresponding webpages are in Portuguese.
Pronunciation lexica (besides the ones included in the above corpora documentation):
The pronunciation lexica developed by L²F use the SAMPA phonetic alphabet. See the SAMPA table for European Portuguese and some comments about its design.