: Difference between revisions

Revision as of 15:47, 17 February 2006


Academia Militar

Integrated Tools and Ontologies

09:30 - Presentation by Joana Paulo.

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

Integrated tools: ATA; JaVaLi!; DID; SAF; Intex (3^rd party)
Ontologies: OntoWine (wine domain ontology); OntoChef (cooking domain ontology)

Lexicons

09:50 - Presentation by Ricardo Daniel Ribeiro.

Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)

PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
LUSOlex: 65k root forms (morphology + gramcat)
BRASILex: 68k root forms (morphology + gramcat)
Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
DicPro: 6.2k anthroponyms
SMorph: 26k root forms (morphology + inflection paradigm)
EPLexIC: 80k word forms (morphology + pronunciation); in construction
ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
Broadcast News: 64k entries (pronunciation)

Corpora

10:10 - Presentation by Paula Cristina Vaz.

Information available at http://corpora.l2f.inesc-id.pt/ (intranet)

CETENFolha: 24Mwords (newspaper corpus)
CETEMPúblico: 180 Mwords (newspaper corpus)
CHInf: 100 children stories (books)
PAROLE: ~20 Mwords
Newspapers: 10 daily newspapers; ~600 Mwords

Coffee Break and Welcome Reception

10:30 - Welcome reception by General Carlos Carvalho dos Reis.

Spoken Language Corpora

11:00 - Presentation by Rui Amaral.

Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
List of presented corpora:

EUROM.1
BDFALA: newspapers and TV debates
SPEECHDAT
CORAL
ALERT-ASR
ALERT-TD: TV broadcast news
IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights
LECTRA: classroom lectures (pilot corpus); two semesters (under construction)
PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice)

Simple Text Processing Tools

11:20 - Presentation by Fernando Batista.

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

Morphological analysis
- SMorph - POS tagger, tokenizer, generator
- Palavroso - POS tagger, tokenizer, generator
- Amorfo/XA - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction)

Morphological generation
- Monge - general form generator; language- and tag-independent; uses LRDB (under development; usable)
- Gover - verb gnerator (~10k manually corrected verbs)

Morpho-syntax processing
- PAsMo - rule-based rewriter
- MARv - morpho-syntactic disambiguation

Syntactic analysis
- SuSAna - Surface syntax analyzer
- ParVO - syntactic analyzer (Earley algorithm; variable unification; O(n³))

Syntax-Semantics interface
- Algas - arrowing construction
- AsDeCopas -

Other tools
- text2syl - silabification
- num2ext - text normalizer
- YAH - (yet another) hyphenator (rule-based); MS Office compatible
- Correcto - spell checker; MS Office compatible
- leia - grapheme-2-phone converter (normalizer)

General purpose
- FSTK lib - finite-state transduce toolkit

Speech Synthesis Tools

11:40 - Presentation by Sérgio Paulo.

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

EmoVoice - transformation of speech-based emotions
L2F_MuLA - multi-level speech aligner and annotator
L2F_PhoneAlign - phonetic aligner
dixi-tok2wrd - normalizer

Speech Recognition Tools

12:00 - Presentation by Hugo Meinedo.

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

AUDIMUS - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform

Brainstorming Presentation

12:20 - Presentation by Luís Caldas de Oliveira

Ideas for new projects: 4~5 surviving ideas to be detailed/discussed in the afternoon session

Lunch Break

12:30 - Lunch at Paço da Rainha
14:30 - Visit to the Academia Buildings: museum, library, council chamber, chapel

Brainstorming

15:00 - Moderated by Luís Caldas de Oliveira

@@ Line 11: / Line 11: @@
 Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
-* Integrated tools: ATA; JaVaLi!; DID; SAF; Intex (3<sup>rd</sup> party)
+* Integrated tools: '''ATA'''; '''JaVaLi!'''; '''DID'''; '''SAF'''; '''Intex''' (3<sup>rd</sup> party)
-* Ontologies: OntoWine (wine domain ontology); OntoChef (cooking domain ontology)
+* Ontologies: '''OntoWine''' (wine domain ontology); '''OntoChef''' (cooking domain ontology)
 == Lexicons ==
@@ Line 20: / Line 20: @@
 Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
-* PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
+* '''PAROLE'''/'''SIMPLE''': 20k root forms + inflection paradigms (morphology + syntax + semantics)
-* LUSOlex: 65k root forms (morphology + gramcat)
+* '''LUSOlex''': 65k root forms (morphology + gramcat)
-* BRASILex: 68k root forms (morphology + gramcat)
+* '''BRASILex''': 68k root forms (morphology + gramcat)
 * Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
-* DicPro: 6.2k anthroponyms
+* '''DicPro''': 6.2k anthroponyms
-* SMorph: 26k root forms (morphology + inflection paradigm)
+* '''SMorph''': 26k root forms (morphology + inflection paradigm)
-* EPLexIC: 80k word forms (morphology + pronunciation); in construction
+* '''EPLexIC''': 80k word forms (morphology + pronunciation); in construction
-* ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
+* '''ONOMASTICA''': 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
-* Broadcast News: 64k entries (pronunciation)
+* '''Broadcast News''': 64k entries (pronunciation)
 == Corpora ==
@@ Line 36: / Line 36: @@
 Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
-* CETENFolha: 24Mwords (newspaper corpus)
+* '''CETENFolha''': 24Mwords (newspaper corpus)
-* CETEMPúblico: 180 Mwords (newspaper corpus)
+* '''CETEMPúblico''': 180 Mwords (newspaper corpus)
-* CHInf: 100 children stories (books)
+* '''CHInf''': 100 children stories (books)
+* '''PAROLE''': ~20 Mwords
 * Newspapers: 10 daily newspapers; ~600 Mwords
-* PAROLE: ~20 Mwords
 == Coffee Break and Welcome Reception ==
@@ Line 53: / Line 53: @@
 List of presented corpora:
-* EUROM.1
+* '''EUROM.1'''
-* BDFALA: newspapers and TV debates
+* '''BDFALA''': newspapers and TV debates
-* SPEECHDAT
+* '''SPEECHDAT'''
-* CORAL
+* '''CORAL'''
-* ALERT-ASR
+* '''ALERT-ASR'''
-* ALERT-TD: TV broadcast news
+* '''ALERT-TD''': TV broadcast news
-* IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights
+* '''IPSOM''': six spoken books (read by professionals); discussion regarding publication and distribution rights
-* LECTRA:  classroom lectures (pilot corpus); two semesters (under construction)
+* '''LECTRA''':  classroom lectures (pilot corpus); two semesters (under construction)
-* PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice)
+* '''PAPOUS''': corpus of children stories performed by António Rito Silva (falsetto voice)
 == Simple Text Processing Tools ==
@@ Line 70: / Line 70: @@
 * Morphological analysis
-** SMorph - POS tagger, tokenizer, generator
+** '''SMorph''' - POS tagger, tokenizer, generator
-** Palavroso - POS tagger, tokenizer, generator
+** '''Palavroso''' - POS tagger, tokenizer, generator
-** Amorfo/XA - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction)
+** '''Amorfo'''/'''XA''' - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction)
 * Morphological generation
-** Monge - general form generator; language- and tag-independent; uses LRDB (under development; usable)
+** '''Monge''' - general form generator; language- and tag-independent; uses LRDB (under development; usable)
-** Gover - verb gnerator (~10k manually corrected verbs)
+** '''Gover''' - verb gnerator (~10k manually corrected verbs)
 * Morpho-syntax processing
-** PAsMo - rule-based rewriter
+** '''PAsMo''' - rule-based rewriter
-** MARv - morpho-syntactic disambiguation
+** '''MARv''' - morpho-syntactic disambiguation
 * Syntactic analysis
-** SuSAna -
+** '''SuSAna''' - Surface syntax analyzer
-** ParVO - syntactic analyzer (Earley algorithm; variable unification; O(n³))
+** '''ParVO''' - syntactic analyzer (Earley algorithm; variable unification; O(n³))
 * Syntax-Semantics interface
-** Algas - arrowing construction
+** '''Algas''' - arrowing construction
-** AsDeCopas -
+** '''AsDeCopas''' -
 * Other tools
-** text2syl - silabification
+** '''text2syl''' - silabification
-** num2ext - text normalizer
+** '''num2ext''' - text normalizer
-** YAH - (yet another) hyphenator (rule-based); MS Office compatible
+** '''YAH''' - (yet another) hyphenator (rule-based); MS Office compatible
-** Correcto - spell checker; MS Office compatible
+** '''Correcto''' - spell checker; MS Office compatible
-** leia - grapheme-2-phone converter (normalizer)
+** '''leia''' - grapheme-2-phone converter (normalizer)
 * General purpose
-** FSTK lib - finite-state transduce toolkit
+** '''FSTK lib''' - finite-state transduce toolkit
 == Speech Synthesis Tools ==
@@ Line 106: / Line 106: @@
 Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
-* EmoVoice - transformation of speech-based emotions
+* '''EmoVoice''' - transformation of speech-based emotions
-* L2F_MuLA - multi-level speech aligner and annotator
+* '''L2F_MuLA''' - multi-level speech aligner and annotator
-* L2F_PhoneAlign - phonetic aligner
+* '''L2F_PhoneAlign''' - phonetic aligner
-* dixi-tok2wrd - normalizer
+* '''dixi-tok2wrd''' - normalizer
 == Speech Recognition Tools ==
@@ Line 117: / Line 117: @@
 Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
-* AUDIMUS - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform
+* '''AUDIMUS''' - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform
 == Brainstorming Presentation ==

: Difference between revisions

From HLT@INESC-ID

Revision as of 15:47, 17 February 2006

Contents

Integrated Tools and Ontologies

Lexicons

Corpora

Coffee Break and Welcome Reception

Spoken Language Corpora

Simple Text Processing Tools

Speech Synthesis Tools

Speech Recognition Tools

Brainstorming Presentation

Lunch Break

Brainstorming