Resources: Difference between revisions

From HLT@INESC-ID

No edit summary
(40 intermediate revisions by 5 users not shown)
Line 1: Line 1:
L²F has been particularly active in the creation of linguistic resources for European Portuguese. The cooperation with CLUL has been of paramount importance in this activity. The resources are listed in inverse chronological order. The corresponding webpages are in Portuguese.  
{{TOCright}}
* [IPSOM Pilot Corpus|IPSOM] (aligned spoken books)
L²F has been particularly active in the creation of linguistic resources for European Portuguese. The cooperation with CLUL has been of paramount importance in this activity. The resources listed are in inverse chronological order. The corresponding webpages are in Portuguese.  
* [http://www.l2f.inesc-id.pt/resources/alert/alert.html ALERT]
* [http://www.l2f.inesc-id.pt/projects/coral/coral1_en.html CORAL]
* [http://www.l2f.inesc-id.pt/projects/bdpub/bdpublico_en.html BD-PÚBLICO]
* [http://www.l2f.inesc-id.pt/resources/spdat/speechdat_en.html SPEECHDAT]
* [http://www.l2f.inesc-id.pt/projects/bdfala/bdfala_en.html BDFALA]
* [[EUROM.1 Corpus|EUROM.1]]


== Corpora ==
=== Speech ===
* POSTPORT - European, Brazilian and African varieties of Portuguese
* [[LECTRA Corpus|LECTRA]] - Classroom lectures
* [[IPSOM Pilot Corpus|IPSOM]] - Aligned spoken books
* [[ALERT Corpus|ALERT]] - Broadcast news
* [[CORAL Corpus|CORAL]] - Spoken dialogues (map task)
* [[BD-PÚBLICO Corpus|BD-PÚBLICO]]- Large vocabulary, speaker-independent, continuous speech
* [[SPEECHDAT Corpus|SPEECHDAT]] - Multi-purpose telephone speech database
* [[BDFALA Corpus|BDFALA]] - Speech analysis / synthesis
* [[EUROM.1 Corpus|EUROM.1]] - Multi-Lingual speech corpus for phonetic comparison
=== Bilingual Corpus ===
* [[Word_Alignments|Golden collection of parallel multi-language word alignments]] - Manually annotated word alignments between six european languages taken from the Europarl common test set <br>(more information on the [[Speech-to-speech Translation]] information page)
== Lexica ==
Pronunciation lexica (besides the ones included in the above corpora documentation):  
Pronunciation lexica (besides the ones included in the above corpora documentation):  
* '''ONOMASTICA''' (Proper names of 11 European languages, in cooperation with TLP - Telefones de Lisboa e Porto): ~ 100.000 names of people, streets, towns and companies
* '''ONOMASTICA''' (Proper names of 11 European languages, in cooperation with TLP - Telefones de Lisboa e Porto): ~ 100.000 names of people, streets, towns and companies
* '''PF''' (Português Fundamental): ~ 26.000 citation forms
* '''PF''' (Português Fundamental): ~ 26.000 citation forms


The pronunciation lexica developed in L²F use the SAMPA phonetic alphabet. See the [[SAMPA Table for European Portuguese|SAMPA table for European Portuguese]] and some comments about its design.  
The pronunciation lexica developed by L²F use the SAMPA phonetic alphabet. See the [[SAMPA Table for European Portuguese|SAMPA table for European Portuguese]] and some comments about its design.


See also:
== See Also ==


* [[Resource Links]]
=== Newspapers ===
* [http://www.ims.uni-stuttgart.de/info/Newspapers.html List of Newspapers on the Internet] produced by [[Isabel Trancoso]] and maintained jointly with IMS Stuttgart.
* [http://www.ims.uni-stuttgart.de/info/Newspapers.html List of Newspapers on the Internet] produced by [[Isabel Trancoso]] and maintained jointly with IMS Stuttgart.
=== Language Resource Centers ===
* [http://www.linguateca.pt Linguateca] (Distributed language resource center for Portuguese)
* [http://www.linguateca.pt Linguateca] (Distributed language resource center for Portuguese)
* [http://www.icp.grenet.fr/ELRA/home.html ELRA] (European Language Resources Association)
* [http://www.elra.info ELRA] (European Language Resources Association)
* [http://morph.ldc.upenn.edu/ LDC] (Linguistic Data Consortium)
* [http://morph.ldc.upenn.edu/ LDC] (Linguistic Data Consortium)
=== Dictionaries ===
* [http://crnvmc.cern.ch/FIND/DICTIONARY? English/Technical Dictionary]
* [http://crnvmc.cern.ch/FIND/DICTIONARY? English/Technical Dictionary]
* [gopher://uts.mcc.ac.uk/77/gopherservices/enquire.english American English Dictionary]
* [gopher://uts.mcc.ac.uk/77/gopherservices/enquire.english American English Dictionary]
Line 28: Line 48:
* [http://nova.sti.nasa.gov/nasa-thesaurus.html NASA Thesaurus]
* [http://nova.sti.nasa.gov/nasa-thesaurus.html NASA Thesaurus]


If you have time, surf the [[Links|links]] ...
 
[[category:Resources]]

Revision as of 09:57, 12 October 2018

L²F has been particularly active in the creation of linguistic resources for European Portuguese. The cooperation with CLUL has been of paramount importance in this activity. The resources listed are in inverse chronological order. The corresponding webpages are in Portuguese.

Corpora

Speech

  • POSTPORT - European, Brazilian and African varieties of Portuguese
  • LECTRA - Classroom lectures
  • IPSOM - Aligned spoken books
  • ALERT - Broadcast news
  • CORAL - Spoken dialogues (map task)
  • BD-PÚBLICO- Large vocabulary, speaker-independent, continuous speech
  • SPEECHDAT - Multi-purpose telephone speech database
  • BDFALA - Speech analysis / synthesis
  • EUROM.1 - Multi-Lingual speech corpus for phonetic comparison

Bilingual Corpus

Lexica

Pronunciation lexica (besides the ones included in the above corpora documentation):

  • ONOMASTICA (Proper names of 11 European languages, in cooperation with TLP - Telefones de Lisboa e Porto): ~ 100.000 names of people, streets, towns and companies
  • PF (Português Fundamental): ~ 26.000 citation forms

The pronunciation lexica developed by L²F use the SAMPA phonetic alphabet. See the SAMPA table for European Portuguese and some comments about its design.

See Also

Newspapers

Language Resource Centers

  • Linguateca (Distributed language resource center for Portuguese)
  • ELRA (European Language Resources Association)
  • LDC (Linguistic Data Consortium)

Dictionaries