: Difference between revisions
From HLT@INESC-ID
No edit summary  | 
				No edit summary  | 
				||
| (One intermediate revision by one other user not shown) | |||
| Line 142: | Line 142: | ||
* '''17:00''' - Moderated by [[Luís Caldas de Oliveira]]  | * '''17:00''' - Moderated by [[Luís Caldas de Oliveira]]  | ||
== Final Remarks ==  | |||
* '''18:15''' - Opportunities for new people: scholarships; post-graduate studies.  | |||
[[category:Seminars]]  | |||
[[category:Seminars 2006]]  | |||
[[category:L²F Day]]  | |||
Latest revision as of 19:33, 7 July 2006
 
 | 
|---|
| Academia Militar | 
L²F Day 2006 took place at the Military Academy in Lisbon.
Integrated Tools and Ontologies
- 09:30 - Presentation by Joana Paulo.
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
- Integrated tools: ATA; JaVaLi!; DID; SAF; Intex (3rd party)
 - Ontologies: OntoWine (wine domain ontology); OntoChef (cooking domain ontology)
 
Lexicons
- 09:50 - Presentation by Ricardo Daniel Ribeiro.
 
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
- PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
 - LUSOlex: 65k root forms (morphology + gramcat)
 - BRASILex: 68k root forms (morphology + gramcat)
 - Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
 - DicPro: 6.2k anthroponyms
 - SMorph: 26k root forms (morphology + inflection paradigm)
 - EPLexIC: 80k word forms (morphology + pronunciation); in construction
 - ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
 - Broadcast News: 64k entries (pronunciation)
 
Corpora
- 10:10 - Presentation by Paula Cristina Vaz.
 
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
- CETENFolha: 24Mwords (newspaper corpus)
 - CETEMPúblico: 180 Mwords (newspaper corpus)
 - CHInf: 100 children stories (books)
 - PAROLE: ~20 Mwords
 - Newspapers: 10 daily newspapers; ~600 Mwords
 
Coffee Break and Welcome Reception
- 10:30 - Welcome reception by General Carlos Carvalho dos Reis.
 
Spoken Language Corpora
- 11:00 - Presentation by Rui Amaral.
 
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
List of presented corpora:
- EUROM.1
 - BDFALA: newspapers and TV debates
 - SPEECHDAT
 - CORAL
 - ALERT-ASR
 - ALERT-TD: TV broadcast news
 - IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights
 - LECTRA: classroom lectures (pilot corpus); two semesters (under construction)
 - PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice)
 
Simple Text Processing Tools
- 11:20 - Presentation by Fernando Batista.
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
- Morphological analysis
- SMorph - POS tagger, tokenizer, generator
 - Palavroso - POS tagger, tokenizer, generator
 - Amorfo/XA - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction)
 
 
- Morphological generation
- Monge - general form generator; language- and tag-independent; uses LRDB (under development; usable)
 - Gover - verb gnerator (~10k manually corrected verbs)
 
 
- Morpho-syntax processing
- PAsMo - rule-based rewriter
 - MARv - morpho-syntactic disambiguation
 
 
- Syntactic analysis
- SuSAna - Surface syntax analyzer
 - ParVO - syntactic analyzer (Earley algorithm; variable unification; O(n³))
 
 
- Syntax-Semantics interface
- Algas - arrowing construction
 - AsDeCopas -
 
 
- Other tools
- text2syl - silabification
 - num2ext - text normalizer
 - YAH - (yet another) hyphenator (rule-based); MS Office compatible
 - Correcto - spell checker; MS Office compatible
 - leia - grapheme-2-phone converter (normalizer)
 
 
- General purpose
- FSTK lib - finite-state transduce toolkit
 
 
Speech Synthesis Tools
- 11:40 - Presentation by Sérgio Paulo.
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
- EmoVoice - transformation of speech-based emotions
 - L2F_MuLA - multi-level speech aligner and annotator
 - L2F_PhoneAlign - phonetic aligner
 - dixi-tok2wrd - normalizer
 
Speech Recognition Tools
- 12:00 - Presentation by Hugo Meinedo.
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
- AUDIMUS - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform
 
Brainstorming Presentation
- 12:20 - Presentation by Luís Caldas de Oliveira
 - Ideas for new projects: 4~5 surviving ideas to be detailed/discussed in the afternoon session
 
Lunch Break
- 12:30 - Lunch at Paço da Rainha
 - 14:30 - Visit to the Academia Buildings: museum, library, council chamber, chapel
 
Brainstorming
- 15:00 - Moderated by Luís Caldas de Oliveira
 
Coffee Break
- 16:45
 
Analysis
- 17:00 - Moderated by Luís Caldas de Oliveira
 
Final Remarks
- 18:15 - Opportunities for new people: scholarships; post-graduate studies.
 
