(→Integrated Tools and Ontologies) |
|||
Line 11: | Line 11: | ||
Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | ||
− | * Integrated tools: ATA; JaVaLi!; DID; SAF; Intex (3<sup>rd</sup> party) | + | * Integrated tools: '''ATA'''; '''JaVaLi!'''; '''DID'''; '''SAF'''; '''Intex''' (3<sup>rd</sup> party) |
− | * Ontologies: OntoWine (wine domain ontology); OntoChef (cooking domain ontology) | + | * Ontologies: '''OntoWine''' (wine domain ontology); '''OntoChef''' (cooking domain ontology) |
== Lexicons == | == Lexicons == | ||
Line 20: | Line 20: | ||
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet) | Information available at http://lrdb.l2f.inesc-id.pt/ (intranet) | ||
− | * PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics) | + | * '''PAROLE'''/'''SIMPLE''': 20k root forms + inflection paradigms (morphology + syntax + semantics) |
− | * LUSOlex: 65k root forms (morphology + gramcat) | + | * '''LUSOlex''': 65k root forms (morphology + gramcat) |
− | * BRASILex: 68k root forms (morphology + gramcat) | + | * '''BRASILex''': 68k root forms (morphology + gramcat) |
* Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms | * Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms | ||
− | * DicPro: 6.2k anthroponyms | + | * '''DicPro''': 6.2k anthroponyms |
− | * SMorph: 26k root forms (morphology + inflection paradigm) | + | * '''SMorph''': 26k root forms (morphology + inflection paradigm) |
− | * EPLexIC: 80k word forms (morphology + pronunciation); in construction | + | * '''EPLexIC''': 80k word forms (morphology + pronunciation); in construction |
− | * ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation | + | * '''ONOMASTICA''': 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation |
− | * Broadcast News: 64k entries (pronunciation) | + | * '''Broadcast News''': 64k entries (pronunciation) |
== Corpora == | == Corpora == | ||
Line 36: | Line 36: | ||
Information available at http://corpora.l2f.inesc-id.pt/ (intranet) | Information available at http://corpora.l2f.inesc-id.pt/ (intranet) | ||
− | * CETENFolha: 24Mwords (newspaper corpus) | + | * '''CETENFolha''': 24Mwords (newspaper corpus) |
− | * CETEMPúblico: 180 Mwords (newspaper corpus) | + | * '''CETEMPúblico''': 180 Mwords (newspaper corpus) |
− | * CHInf: 100 children stories (books) | + | * '''CHInf''': 100 children stories (books) |
+ | * '''PAROLE''': ~20 Mwords | ||
* Newspapers: 10 daily newspapers; ~600 Mwords | * Newspapers: 10 daily newspapers; ~600 Mwords | ||
− | |||
== Coffee Break and Welcome Reception == | == Coffee Break and Welcome Reception == | ||
Line 53: | Line 53: | ||
List of presented corpora: | List of presented corpora: | ||
− | * EUROM.1 | + | * '''EUROM.1''' |
− | * BDFALA: newspapers and TV debates | + | * '''BDFALA''': newspapers and TV debates |
− | * SPEECHDAT | + | * '''SPEECHDAT''' |
− | * CORAL | + | * '''CORAL''' |
− | * ALERT-ASR | + | * '''ALERT-ASR''' |
− | * ALERT-TD: TV broadcast news | + | * '''ALERT-TD''': TV broadcast news |
− | * IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights | + | * '''IPSOM''': six spoken books (read by professionals); discussion regarding publication and distribution rights |
− | * LECTRA: classroom lectures (pilot corpus); two semesters (under construction) | + | * '''LECTRA''': classroom lectures (pilot corpus); two semesters (under construction) |
− | * PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice) | + | * '''PAPOUS''': corpus of children stories performed by António Rito Silva (falsetto voice) |
== Simple Text Processing Tools == | == Simple Text Processing Tools == | ||
Line 70: | Line 70: | ||
* Morphological analysis | * Morphological analysis | ||
− | ** SMorph - POS tagger, tokenizer, generator | + | ** '''SMorph''' - POS tagger, tokenizer, generator |
− | ** Palavroso - POS tagger, tokenizer, generator | + | ** '''Palavroso''' - POS tagger, tokenizer, generator |
− | ** Amorfo/XA - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction) | + | ** '''Amorfo'''/'''XA''' - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction) |
* Morphological generation | * Morphological generation | ||
− | ** Monge - general form generator; language- and tag-independent; uses LRDB (under development; usable) | + | ** '''Monge''' - general form generator; language- and tag-independent; uses LRDB (under development; usable) |
− | ** Gover - verb gnerator (~10k manually corrected verbs) | + | ** '''Gover''' - verb gnerator (~10k manually corrected verbs) |
* Morpho-syntax processing | * Morpho-syntax processing | ||
− | ** PAsMo - rule-based rewriter | + | ** '''PAsMo''' - rule-based rewriter |
− | ** MARv - morpho-syntactic disambiguation | + | ** '''MARv''' - morpho-syntactic disambiguation |
* Syntactic analysis | * Syntactic analysis | ||
− | ** SuSAna - | + | ** '''SuSAna''' - Surface syntax analyzer |
− | ** ParVO - syntactic analyzer (Earley algorithm; variable unification; O(n³)) | + | ** '''ParVO''' - syntactic analyzer (Earley algorithm; variable unification; O(n³)) |
* Syntax-Semantics interface | * Syntax-Semantics interface | ||
− | ** Algas - arrowing construction | + | ** '''Algas''' - arrowing construction |
− | ** AsDeCopas - | + | ** '''AsDeCopas''' - |
* Other tools | * Other tools | ||
− | ** text2syl - silabification | + | ** '''text2syl''' - silabification |
− | ** num2ext - text normalizer | + | ** '''num2ext''' - text normalizer |
− | ** YAH - (yet another) hyphenator (rule-based); MS Office compatible | + | ** '''YAH''' - (yet another) hyphenator (rule-based); MS Office compatible |
− | ** Correcto - spell checker; MS Office compatible | + | ** '''Correcto''' - spell checker; MS Office compatible |
− | ** leia - grapheme-2-phone converter (normalizer) | + | ** '''leia''' - grapheme-2-phone converter (normalizer) |
* General purpose | * General purpose | ||
− | ** FSTK lib - finite-state transduce toolkit | + | ** '''FSTK lib''' - finite-state transduce toolkit |
== Speech Synthesis Tools == | == Speech Synthesis Tools == | ||
Line 106: | Line 106: | ||
Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | ||
− | * EmoVoice - transformation of speech-based emotions | + | * '''EmoVoice''' - transformation of speech-based emotions |
− | * L2F_MuLA - multi-level speech aligner and annotator | + | * '''L2F_MuLA''' - multi-level speech aligner and annotator |
− | * L2F_PhoneAlign - phonetic aligner | + | * '''L2F_PhoneAlign''' - phonetic aligner |
− | * dixi-tok2wrd - normalizer | + | * '''dixi-tok2wrd''' - normalizer |
== Speech Recognition Tools == | == Speech Recognition Tools == | ||
Line 117: | Line 117: | ||
Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | ||
− | * AUDIMUS - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform | + | * '''AUDIMUS''' - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform |
== Brainstorming Presentation == | == Brainstorming Presentation == |
![]() |
---|
Academia Militar |
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
List of presented corpora:
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)