: Difference between revisions
From HLT@INESC-ID
| No edit summary | |||
| Line 3: | Line 3: | ||
| Presentation by [[Joana Paulo]]. | Presentation by [[Joana Paulo]]. | ||
| Information available at http://l2f.l2f.inesc-id.pt/ | Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | ||
| * Integrated tools: | * Integrated tools: | ||
| Line 22: | Line 22: | ||
| Presentation by [[Ricardo Daniel Ribeiro]]. | Presentation by [[Ricardo Daniel Ribeiro]]. | ||
| Information available at http://lrdb.l2f.inesc-id.pt/ | Information available at http://lrdb.l2f.inesc-id.pt/ (intranet) | ||
| * PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics) | * PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics) | ||
| Line 33: | Line 33: | ||
| * ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation | * ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation | ||
| * Broadcast News: 64k entries (pronunciation) | * Broadcast News: 64k entries (pronunciation) | ||
| == Corpora == | |||
| Presentation by [[Paula Vaz]]. | |||
| Information available at http://corpora.l2f.inesc-id.pt/ (intranet) | |||
| * CETENFolha: 24Mwords (newspaper corpus) | |||
| * CETEMPúblico: 180 Mwords (newspaper corpus) | |||
| * CHInf: 100 children stories (books) | |||
| * Newspapers: 10 daily newspapers; ~600 Mwords | |||
| * PAROLE: ~20 Mwords | |||
Revision as of 10:34, 17 February 2006
Integrated Tools and Ontologies
Presentation by Joana Paulo.
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
- Integrated tools:
- ATA
- JaVaLi!
- DID
- SAF
 
- 3rd Party
- Intex
 
- Ontologies:
- OntoWine (wine domain ontology)
- OntoChef (cooking domain ontology)
 
Lexicons
Presentation by Ricardo Daniel Ribeiro.
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
- PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
- LUSOlex: 65k root forms (morphology + gramcat)
- BRASILex: 68k root forms (morphology + gramcat)
- Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
- DicPro: 6.2k anthroponyms
- SMorph: 26k root forms (morphology + inflection paradigm)
- EPLexIC: 80k word forms (morphology + pronunciation); in construction
- ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
- Broadcast News: 64k entries (pronunciation)
Corpora
Presentation by Paula Vaz.
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
- CETENFolha: 24Mwords (newspaper corpus)
- CETEMPúblico: 180 Mwords (newspaper corpus)
- CHInf: 100 children stories (books)
- Newspapers: 10 daily newspapers; ~600 Mwords
- PAROLE: ~20 Mwords