Resources: Difference between revisions

From HLT@INESC-ID

No edit summary
Line 18: Line 18:


* '''[http://www.l2f.inesc-id.pt/resources/translation/golden_collection.zip Golden collection of parallel multi-language word alignments]'''
* '''[http://www.l2f.inesc-id.pt/resources/translation/golden_collection.zip Golden collection of parallel multi-language word alignments]'''
**Guidelines followed to produce the manual word alignments over six different language pairs (all combinations between Portuguese, English, French and Spanish). [http://www.l2f.inesc-id.pt/resources/translation/graca_et_al-TR38-2008-guidelines.pdf PDF]
**Guidelines followed to produce the manual word alignments over six different language pairs (all combinations between Portuguese, English, French and Spanish) ([http://www.inesc-id.pt/pt/indicadores/Ficheiros/4734.pdf PDF]).
**João de Almeida Varelas Graça, Joana Paulo Pardal, Luísa Coheur, Diamantino António Caseiro, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/4735.pdf Building a golden collection of parallel Multi-Language Word Alignment], In The 6th International Conference on Language Resources and Evaluation, LREC 2008, May 2008
**João de Almeida Varelas Graça, Joana Paulo Pardal, Luísa Coheur, Diamantino António Caseiro, [http://www.inesc-id.pt/pt/indicadores/Ficheiros/4735.pdf Building a golden collection of parallel Multi-Language Word Alignment], In The 6th International Conference on Language Resources and Evaluation, LREC 2008, May 2008



Revision as of 18:00, 24 June 2008

L²F has been particularly active in the creation of linguistic resources for European Portuguese. The cooperation with CLUL has been of paramount importance in this activity. The resources listed are in inverse chronological order. The corresponding webpages are in Portuguese.

Corpora

Speech

  • LECTRA - Classroom lectures
  • IPSOM - Aligned spoken books
  • ALERT - Broadcast news
  • CORAL - Spoken dialogues (map task)
  • BD-PÚBLICO- Large vocabulary, speaker-independent, continuous speech
  • SPEECHDAT - Multi-purpose telephone speech database
  • BDFALA - Speech analysis / synthesis
  • EUROM.1 - Multi-Lingual speech corpus for phonetic comparison

Translation

Lexica

Pronunciation lexica (besides the ones included in the above corpora documentation):

  • ONOMASTICA (Proper names of 11 European languages, in cooperation with TLP - Telefones de Lisboa e Porto): ~ 100.000 names of people, streets, towns and companies
  • PF (Português Fundamental): ~ 26.000 citation forms

The pronunciation lexica developed by L²F use the SAMPA phonetic alphabet. See the SAMPA table for European Portuguese and some comments about its design.

See Also

Newspapers

Language Resource Centers

  • Linguateca (Distributed language resource center for Portuguese)
  • ELRA (European Language Resources Association)
  • LDC (Linguistic Data Consortium)

Dictionaries