: Difference between revisions
From HLT@INESC-ID
No edit summary |
|||
Line 3: | Line 3: | ||
Presentation by [[Joana Paulo]]. | Presentation by [[Joana Paulo]]. | ||
Information available at http://l2f.l2f.inesc-id.pt/ | Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | ||
* Integrated tools: | * Integrated tools: | ||
Line 22: | Line 22: | ||
Presentation by [[Ricardo Daniel Ribeiro]]. | Presentation by [[Ricardo Daniel Ribeiro]]. | ||
Information available at http://lrdb.l2f.inesc-id.pt/ | Information available at http://lrdb.l2f.inesc-id.pt/ (intranet) | ||
* PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics) | * PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics) | ||
Line 33: | Line 33: | ||
* ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation | * ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation | ||
* Broadcast News: 64k entries (pronunciation) | * Broadcast News: 64k entries (pronunciation) | ||
== Corpora == | |||
Presentation by [[Paula Vaz]]. | |||
Information available at http://corpora.l2f.inesc-id.pt/ (intranet) | |||
* CETENFolha: 24Mwords (newspaper corpus) | |||
* CETEMPúblico: 180 Mwords (newspaper corpus) | |||
* CHInf: 100 children stories (books) | |||
* Newspapers: 10 daily newspapers; ~600 Mwords | |||
* PAROLE: ~20 Mwords |
Revision as of 10:34, 17 February 2006
Integrated Tools and Ontologies
Presentation by Joana Paulo.
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
- Integrated tools:
- ATA
- JaVaLi!
- DID
- SAF
- 3rd Party
- Intex
- Ontologies:
- OntoWine (wine domain ontology)
- OntoChef (cooking domain ontology)
Lexicons
Presentation by Ricardo Daniel Ribeiro.
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
- PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
- LUSOlex: 65k root forms (morphology + gramcat)
- BRASILex: 68k root forms (morphology + gramcat)
- Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
- DicPro: 6.2k anthroponyms
- SMorph: 26k root forms (morphology + inflection paradigm)
- EPLexIC: 80k word forms (morphology + pronunciation); in construction
- ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
- Broadcast News: 64k entries (pronunciation)
Corpora
Presentation by Paula Vaz.
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
- CETENFolha: 24Mwords (newspaper corpus)
- CETEMPúblico: 180 Mwords (newspaper corpus)
- CHInf: 100 children stories (books)
- Newspapers: 10 daily newspapers; ~600 Mwords
- PAROLE: ~20 Mwords