From HLT@INESC-ID
Integrated Tools and Ontologies
Presentation by Joana Paulo.
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
- Integrated tools:
- ATA
- JaVaLi!
- DID
- SAF
- 3rd Party
- Intex
- Ontologies:
- OntoWine (wine domain ontology)
- OntoChef (cooking domain ontology)
Lexicons
Presentation by Ricardo Daniel Ribeiro.
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
- PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
- LUSOlex: 65k root forms (morphology + gramcat)
- BRASILex: 68k root forms (morphology + gramcat)
- Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
- DicPro: 6.2k anthroponyms
- SMorph: 26k root forms (morphology + inflection paradigm)
- EPLexIC: 80k word forms (morphology + pronunciation); in construction
- ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
- Broadcast News: 64k entries (pronunciation)
Corpora
Presentation by Paula Vaz.
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
- CETENFolha: 24Mwords (newspaper corpus)
- CETEMPúblico: 180 Mwords (newspaper corpus)
- CHInf: 100 children stories (books)
- Newspapers: 10 daily newspapers; ~600 Mwords
- PAROLE: ~20 Mwords