: Difference between revisions

From HLT@INESC-ID

No edit summary
Line 3: Line 3:
Presentation by [[Joana Paulo]].
Presentation by [[Joana Paulo]].


Information available at http://l2f.l2f.inesc-id.pt/
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)


* Integrated tools:
* Integrated tools:
Line 22: Line 22:
Presentation by [[Ricardo Daniel Ribeiro]].
Presentation by [[Ricardo Daniel Ribeiro]].


Information available at http://lrdb.l2f.inesc-id.pt/
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)


* PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
* PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
Line 33: Line 33:
* ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
* ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
* Broadcast News: 64k entries (pronunciation)
* Broadcast News: 64k entries (pronunciation)
== Corpora ==
Presentation by [[Paula Vaz]].
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
* CETENFolha: 24Mwords (newspaper corpus)
* CETEMPúblico: 180 Mwords (newspaper corpus)
* CHInf: 100 children stories (books)
* Newspapers: 10 daily newspapers; ~600 Mwords
* PAROLE: ~20 Mwords

Revision as of 10:34, 17 February 2006

Integrated Tools and Ontologies

Presentation by Joana Paulo.

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

  • Integrated tools:
    • ATA
    • JaVaLi!
    • DID
    • SAF
  • 3rd Party
    • Intex
  • Ontologies:
    • OntoWine (wine domain ontology)
    • OntoChef (cooking domain ontology)

Lexicons

Presentation by Ricardo Daniel Ribeiro.

Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)

  • PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
  • LUSOlex: 65k root forms (morphology + gramcat)
  • BRASILex: 68k root forms (morphology + gramcat)
  • Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
  • DicPro: 6.2k anthroponyms
  • SMorph: 26k root forms (morphology + inflection paradigm)
  • EPLexIC: 80k word forms (morphology + pronunciation); in construction
  • ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
  • Broadcast News: 64k entries (pronunciation)

Corpora

Presentation by Paula Vaz.

Information available at http://corpora.l2f.inesc-id.pt/ (intranet)

  • CETENFolha: 24Mwords (newspaper corpus)
  • CETEMPúblico: 180 Mwords (newspaper corpus)
  • CHInf: 100 children stories (books)
  • Newspapers: 10 daily newspapers; ~600 Mwords
  • PAROLE: ~20 Mwords
Retrieved from ""