From HLT@INESC-ID

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Academia Militar

Integrated Tools and Ontologies

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

  • Integrated tools: ATA; JaVaLi!; DID; SAF; Intex (3rd party)
  • Ontologies: OntoWine (wine domain ontology); OntoChef (cooking domain ontology)

Lexicons

Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)

  • PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
  • LUSOlex: 65k root forms (morphology + gramcat)
  • BRASILex: 68k root forms (morphology + gramcat)
  • Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
  • DicPro: 6.2k anthroponyms
  • SMorph: 26k root forms (morphology + inflection paradigm)
  • EPLexIC: 80k word forms (morphology + pronunciation); in construction
  • ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
  • Broadcast News: 64k entries (pronunciation)

Corpora

Information available at http://corpora.l2f.inesc-id.pt/ (intranet)

  • CETENFolha: 24Mwords (newspaper corpus)
  • CETEMPúblico: 180 Mwords (newspaper corpus)
  • CHInf: 100 children stories (books)
  • PAROLE: ~20 Mwords
  • Newspapers: 10 daily newspapers; ~600 Mwords

Coffee Break and Welcome Reception

  • 10:30 - Welcome reception by General Carlos Carvalho dos Reis.

Spoken Language Corpora

Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
List of presented corpora:

  • EUROM.1
  • BDFALA: newspapers and TV debates
  • SPEECHDAT
  • CORAL
  • ALERT-ASR
  • ALERT-TD: TV broadcast news
  • IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights
  • LECTRA: classroom lectures (pilot corpus); two semesters (under construction)
  • PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice)

Simple Text Processing Tools

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

  • Morphological analysis
    • SMorph - POS tagger, tokenizer, generator
    • Palavroso - POS tagger, tokenizer, generator
    • Amorfo/XA - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction)
  • Morphological generation
    • Monge - general form generator; language- and tag-independent; uses LRDB (under development; usable)
    • Gover - verb gnerator (~10k manually corrected verbs)
  • Morpho-syntax processing
    • PAsMo - rule-based rewriter
    • MARv - morpho-syntactic disambiguation
  • Syntactic analysis
    • SuSAna - Surface syntax analyzer
    • ParVO - syntactic analyzer (Earley algorithm; variable unification; O(n³))
  • Syntax-Semantics interface
    • Algas - arrowing construction
    • AsDeCopas -
  • Other tools
    • text2syl - silabification
    • num2ext - text normalizer
    • YAH - (yet another) hyphenator (rule-based); MS Office compatible
    • Correcto - spell checker; MS Office compatible
    • leia - grapheme-2-phone converter (normalizer)
  • General purpose
    • FSTK lib - finite-state transduce toolkit

Speech Synthesis Tools

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

  • EmoVoice - transformation of speech-based emotions
  • L2F_MuLA - multi-level speech aligner and annotator
  • L2F_PhoneAlign - phonetic aligner
  • dixi-tok2wrd - normalizer

Speech Recognition Tools

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

  • AUDIMUS - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform

Brainstorming Presentation

  • 12:20 - Presentation by Luís Caldas de Oliveira
  • Ideas for new projects: 4~5 surviving ideas to be detailed/discussed in the afternoon session

Lunch Break

  • 12:30 - Lunch at Paço da Rainha
  • 14:30 - Visit to the Academia Buildings: museum, library, council chamber, chapel

Brainstorming

Retrieved from ""