: Difference between revisions

From HLT@INESC-ID

No edit summary
No edit summary
Line 1: Line 1:
== Integrated Tools and Ontologies ==
== Integrated Tools and Ontologies ==


Presentation by [[Joana Paulo]].
* 09:30 - Presentation by [[Joana Paulo]].


Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
Line 20: Line 20:
== Lexicons ==
== Lexicons ==


Presentation by [[Ricardo Daniel Ribeiro]].
* 09:50 - Presentation by [[Ricardo Daniel Ribeiro]].


Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
Line 36: Line 36:
== Corpora ==
== Corpora ==


Presentation by [[Paula Vaz]].
* 10:10 - Presentation by [[Paula Cristina Vaz]].


Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
Line 48: Line 48:
== Coffee Break and Welcome Reception ==
== Coffee Break and Welcome Reception ==


* Welcome reception by General Carlos Carvalho dos Reis.
* 10:30 - Welcome reception by General Carlos Carvalho dos Reis.


== Spoken Language Corpora ==
== Spoken Language Corpora ==


Presentation by [[Rui Amaral]].
* 11:00 - Presentation by [[Rui Amaral]].


Information available at http://corpora.l2f.inesc-id.pt/ (intranet)<br/>
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)<br/>
Line 58: Line 58:


* EUROM.1
* EUROM.1
* BDFALA: newspapers and TV debates
* SPEECHDAT
* CORAL
* ALERT-ASR
* ALERT-TD: TV broadcast news
* IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights
* LECTRA:  classroom lectures (pilot corpus); two semesters (under construction)
* PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice)

Revision as of 11:27, 17 February 2006

Integrated Tools and Ontologies

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

  • Integrated tools:
    • ATA
    • JaVaLi!
    • DID
    • SAF
  • 3rd Party
    • Intex
  • Ontologies:
    • OntoWine (wine domain ontology)
    • OntoChef (cooking domain ontology)

Lexicons

Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)

  • PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
  • LUSOlex: 65k root forms (morphology + gramcat)
  • BRASILex: 68k root forms (morphology + gramcat)
  • Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
  • DicPro: 6.2k anthroponyms
  • SMorph: 26k root forms (morphology + inflection paradigm)
  • EPLexIC: 80k word forms (morphology + pronunciation); in construction
  • ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
  • Broadcast News: 64k entries (pronunciation)

Corpora

Information available at http://corpora.l2f.inesc-id.pt/ (intranet)

  • CETENFolha: 24Mwords (newspaper corpus)
  • CETEMPúblico: 180 Mwords (newspaper corpus)
  • CHInf: 100 children stories (books)
  • Newspapers: 10 daily newspapers; ~600 Mwords
  • PAROLE: ~20 Mwords

Coffee Break and Welcome Reception

  • 10:30 - Welcome reception by General Carlos Carvalho dos Reis.

Spoken Language Corpora

Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
List of presented corpora:

  • EUROM.1
  • BDFALA: newspapers and TV debates
  • SPEECHDAT
  • CORAL
  • ALERT-ASR
  • ALERT-TD: TV broadcast news
  • IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights
  • LECTRA: classroom lectures (pilot corpus); two semesters (under construction)
  • PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice)
Retrieved from ""