: Difference between revisions
From HLT@INESC-ID
No edit summary |
No edit summary |
||
(9 intermediate revisions by one other user not shown) | |||
Line 4: | Line 4: | ||
! style='text-align: center;' | [http://www.academiamilitar.pt/ Academia Militar] | ! style='text-align: center;' | [http://www.academiamilitar.pt/ Academia Militar] | ||
|} | |} | ||
L²F Day 2006 took place at the [http://www.academiamilitar.pt Military Academy] in Lisbon. | |||
== Integrated Tools and Ontologies == | == Integrated Tools and Ontologies == | ||
* 09:30 - Presentation by [[Joana Paulo]]. | * '''09:30''' - Presentation by [[Joana Paulo]]. | ||
Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | ||
* Integrated tools: | * Integrated tools: '''ATA'''; '''JaVaLi!'''; '''DID'''; '''SAF'''; '''Intex''' (3<sup>rd</sup> party) | ||
* Ontologies: '''OntoWine''' (wine domain ontology); '''OntoChef''' (cooking domain ontology) | |||
* Ontologies: | |||
== Lexicons == | == Lexicons == | ||
* 09:50 - Presentation by [[Ricardo Daniel Ribeiro]]. | * '''09:50''' - Presentation by [[Ricardo Daniel Ribeiro]]. | ||
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet) | Information available at http://lrdb.l2f.inesc-id.pt/ (intranet) | ||
* PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics) | * '''PAROLE'''/'''SIMPLE''': 20k root forms + inflection paradigms (morphology + syntax + semantics) | ||
* LUSOlex: 65k root forms (morphology + gramcat) | * '''LUSOlex''': 65k root forms (morphology + gramcat) | ||
* BRASILex: 68k root forms (morphology + gramcat) | * '''BRASILex''': 68k root forms (morphology + gramcat) | ||
* Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms | * Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms | ||
* DicPro: 6.2k anthroponyms | * '''DicPro''': 6.2k anthroponyms | ||
* SMorph: 26k root forms (morphology + inflection paradigm) | * '''SMorph''': 26k root forms (morphology + inflection paradigm) | ||
* EPLexIC: 80k word forms (morphology + pronunciation); in construction | * '''EPLexIC''': 80k word forms (morphology + pronunciation); in construction | ||
* ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation | * '''ONOMASTICA''': 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation | ||
* Broadcast News: 64k entries (pronunciation) | * '''Broadcast News''': 64k entries (pronunciation) | ||
== Corpora == | == Corpora == | ||
* 10:10 - Presentation by [[Paula Cristina Vaz]]. | * '''10:10''' - Presentation by [[Paula Cristina Vaz]]. | ||
Information available at http://corpora.l2f.inesc-id.pt/ (intranet) | Information available at http://corpora.l2f.inesc-id.pt/ (intranet) | ||
* CETENFolha: 24Mwords (newspaper corpus) | * '''CETENFolha''': 24Mwords (newspaper corpus) | ||
* CETEMPúblico: 180 Mwords (newspaper corpus) | * '''CETEMPúblico''': 180 Mwords (newspaper corpus) | ||
* CHInf: 100 children stories (books) | * '''CHInf''': 100 children stories (books) | ||
* '''PAROLE''': ~20 Mwords | |||
* Newspapers: 10 daily newspapers; ~600 Mwords | * Newspapers: 10 daily newspapers; ~600 Mwords | ||
== Coffee Break and Welcome Reception == | == Coffee Break and Welcome Reception == | ||
* 10:30 - Welcome reception by General Carlos Carvalho dos Reis. | * '''10:30''' - Welcome reception by General Carlos Carvalho dos Reis. | ||
== Spoken Language Corpora == | == Spoken Language Corpora == | ||
* 11:00 - Presentation by [[Rui Amaral]]. | * '''11:00''' - Presentation by [[Rui Amaral]]. | ||
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)<br/> | Information available at http://corpora.l2f.inesc-id.pt/ (intranet)<br/> | ||
List of presented corpora: | List of presented corpora: | ||
* EUROM.1 | * '''EUROM.1''' | ||
* BDFALA: newspapers and TV debates | * '''BDFALA''': newspapers and TV debates | ||
* SPEECHDAT | * '''SPEECHDAT''' | ||
* CORAL | * '''CORAL''' | ||
* ALERT-ASR | * '''ALERT-ASR''' | ||
* ALERT-TD: TV broadcast news | * '''ALERT-TD''': TV broadcast news | ||
* IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights | * '''IPSOM''': six spoken books (read by professionals); discussion regarding publication and distribution rights | ||
* LECTRA: classroom lectures (pilot corpus); two semesters (under construction) | * '''LECTRA''': classroom lectures (pilot corpus); two semesters (under construction) | ||
* PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice) | * '''PAPOUS''': corpus of children stories performed by António Rito Silva (falsetto voice) | ||
== Simple Text Processing Tools == | == Simple Text Processing Tools == | ||
* 11:20 - Presentation by [[Fernando Batista]]. | * '''11:20''' - Presentation by [[Fernando Batista]]. | ||
Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | ||
* Morphological analysis | * Morphological analysis | ||
** SMorph - POS tagger, tokenizer, generator | ** '''SMorph''' - POS tagger, tokenizer, generator | ||
** Palavroso - POS tagger, tokenizer, generator | ** '''Palavroso''' - POS tagger, tokenizer, generator | ||
** Amorfo/XA - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction) | ** '''Amorfo'''/'''XA''' - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction) | ||
* Morphological generation | * Morphological generation | ||
** Monge - general form generator; language- and tag-independent; uses LRDB (under development; usable) | ** '''Monge''' - general form generator; language- and tag-independent; uses LRDB (under development; usable) | ||
** Gover - verb gnerator (~10k manually corrected verbs) | ** '''Gover''' - verb gnerator (~10k manually corrected verbs) | ||
* Morpho-syntax processing | * Morpho-syntax processing | ||
** PAsMo - rule-based rewriter | ** '''PAsMo''' - rule-based rewriter | ||
** MARv - morpho-syntactic disambiguation | ** '''MARv''' - morpho-syntactic disambiguation | ||
* Syntactic analysis | * Syntactic analysis | ||
** SuSAna - | ** '''SuSAna''' - Surface syntax analyzer | ||
** ParVO - syntactic analyzer (Earley algorithm; variable unification; O(n³)) | ** '''ParVO''' - syntactic analyzer (Earley algorithm; variable unification; O(n³)) | ||
* Syntax-Semantics interface | * Syntax-Semantics interface | ||
** Algas - arrowing construction | ** '''Algas''' - arrowing construction | ||
** AsDeCopas - | ** '''AsDeCopas''' - | ||
* Other tools | * Other tools | ||
** text2syl - silabification | ** '''text2syl''' - silabification | ||
** num2ext - text normalizer | ** '''num2ext''' - text normalizer | ||
** YAH - (yet another) hyphenator (rule-based); MS Office compatible | ** '''YAH''' - ''(yet another) hyphenator'' (rule-based); MS Office compatible | ||
** Correcto - spell checker; MS Office compatible | ** '''Correcto''' - spell checker; MS Office compatible | ||
** leia - grapheme-2-phone converter (normalizer) | ** '''leia''' - grapheme-2-phone converter (normalizer) | ||
* General purpose | * General purpose | ||
** FSTK lib - finite-state transduce toolkit | ** '''FSTK lib''' - finite-state transduce toolkit | ||
== Speech Synthesis Tools == | == Speech Synthesis Tools == | ||
* 11:40 - Presentation by [[Sérgio Paulo]]. | * '''11:40''' - Presentation by [[Sérgio Paulo]]. | ||
Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | ||
* EmoVoice - transformation of speech-based emotions | * '''EmoVoice''' - transformation of speech-based emotions | ||
* L2F_MuLA - multi-level speech aligner and annotator | * '''L2F_MuLA''' - multi-level speech aligner and annotator | ||
* L2F_PhoneAlign - phonetic aligner | * '''L2F_PhoneAlign''' - phonetic aligner | ||
* dixi-tok2wrd - normalizer | * '''dixi-tok2wrd''' - normalizer | ||
== Speech Recognition Tools == | == Speech Recognition Tools == | ||
* 12:00 - Presentation by [[Hugo Meinedo]]. | * '''12:00''' - Presentation by [[Hugo Meinedo]]. | ||
Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | Information available at http://l2f.l2f.inesc-id.pt/ (intranet) | ||
* AUDIMUS - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform | * '''AUDIMUS''' - ASR with well-documented API; '''AUDIMUS.linux''' (frozen; discontinued; no longer supported; usable); '''AUDIMUS.cvs''' (usable; development version); MS Office integration; multi-platform | ||
== Brainstorming Presentation == | |||
* '''12:20''' - Presentation by [[Luís Caldas de Oliveira]] | |||
* Ideas for new projects: 4~5 surviving ideas to be detailed/discussed in the afternoon session | |||
== Lunch Break == | |||
* '''12:30''' - Lunch at Paço da Rainha | |||
* '''14:30''' - Visit to the Academia Buildings: museum, library, council chamber, chapel | |||
== Brainstorming == | |||
* '''15:00''' - Moderated by [[Luís Caldas de Oliveira]] | |||
== Coffee Break == | |||
* '''16:45''' | |||
== Analysis == | |||
* '''17:00''' - Moderated by [[Luís Caldas de Oliveira]] | |||
== | == Final Remarks == | ||
* | * '''18:15''' - Opportunities for new people: scholarships; post-graduate studies. | ||
[[category:Seminars]] | |||
[[category:Seminars 2006]] | |||
[[category:L²F Day]] |
Latest revision as of 19:33, 7 July 2006
![]() |
---|
Academia Militar |
L²F Day 2006 took place at the Military Academy in Lisbon.
Integrated Tools and Ontologies
- 09:30 - Presentation by Joana Paulo.
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
- Integrated tools: ATA; JaVaLi!; DID; SAF; Intex (3rd party)
- Ontologies: OntoWine (wine domain ontology); OntoChef (cooking domain ontology)
Lexicons
- 09:50 - Presentation by Ricardo Daniel Ribeiro.
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
- PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
- LUSOlex: 65k root forms (morphology + gramcat)
- BRASILex: 68k root forms (morphology + gramcat)
- Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
- DicPro: 6.2k anthroponyms
- SMorph: 26k root forms (morphology + inflection paradigm)
- EPLexIC: 80k word forms (morphology + pronunciation); in construction
- ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
- Broadcast News: 64k entries (pronunciation)
Corpora
- 10:10 - Presentation by Paula Cristina Vaz.
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
- CETENFolha: 24Mwords (newspaper corpus)
- CETEMPúblico: 180 Mwords (newspaper corpus)
- CHInf: 100 children stories (books)
- PAROLE: ~20 Mwords
- Newspapers: 10 daily newspapers; ~600 Mwords
Coffee Break and Welcome Reception
- 10:30 - Welcome reception by General Carlos Carvalho dos Reis.
Spoken Language Corpora
- 11:00 - Presentation by Rui Amaral.
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
List of presented corpora:
- EUROM.1
- BDFALA: newspapers and TV debates
- SPEECHDAT
- CORAL
- ALERT-ASR
- ALERT-TD: TV broadcast news
- IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights
- LECTRA: classroom lectures (pilot corpus); two semesters (under construction)
- PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice)
Simple Text Processing Tools
- 11:20 - Presentation by Fernando Batista.
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
- Morphological analysis
- SMorph - POS tagger, tokenizer, generator
- Palavroso - POS tagger, tokenizer, generator
- Amorfo/XA - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction)
- Morphological generation
- Monge - general form generator; language- and tag-independent; uses LRDB (under development; usable)
- Gover - verb gnerator (~10k manually corrected verbs)
- Morpho-syntax processing
- PAsMo - rule-based rewriter
- MARv - morpho-syntactic disambiguation
- Syntactic analysis
- SuSAna - Surface syntax analyzer
- ParVO - syntactic analyzer (Earley algorithm; variable unification; O(n³))
- Syntax-Semantics interface
- Algas - arrowing construction
- AsDeCopas -
- Other tools
- text2syl - silabification
- num2ext - text normalizer
- YAH - (yet another) hyphenator (rule-based); MS Office compatible
- Correcto - spell checker; MS Office compatible
- leia - grapheme-2-phone converter (normalizer)
- General purpose
- FSTK lib - finite-state transduce toolkit
Speech Synthesis Tools
- 11:40 - Presentation by Sérgio Paulo.
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
- EmoVoice - transformation of speech-based emotions
- L2F_MuLA - multi-level speech aligner and annotator
- L2F_PhoneAlign - phonetic aligner
- dixi-tok2wrd - normalizer
Speech Recognition Tools
- 12:00 - Presentation by Hugo Meinedo.
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
- AUDIMUS - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform
Brainstorming Presentation
- 12:20 - Presentation by Luís Caldas de Oliveira
- Ideas for new projects: 4~5 surviving ideas to be detailed/discussed in the afternoon session
Lunch Break
- 12:30 - Lunch at Paço da Rainha
- 14:30 - Visit to the Academia Buildings: museum, library, council chamber, chapel
Brainstorming
- 15:00 - Moderated by Luís Caldas de Oliveira
Coffee Break
- 16:45
Analysis
- 17:00 - Moderated by Luís Caldas de Oliveira
Final Remarks
- 18:15 - Opportunities for new people: scholarships; post-graduate studies.