: Difference between revisions

Latest revision as of 19:33, 7 July 2006


Academia Militar

L²F Day 2006 took place at the Military Academy in Lisbon.

Integrated Tools and Ontologies

09:30 - Presentation by Joana Paulo.

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

Integrated tools: ATA; JaVaLi!; DID; SAF; Intex (3^rd party)
Ontologies: OntoWine (wine domain ontology); OntoChef (cooking domain ontology)

Lexicons

09:50 - Presentation by Ricardo Daniel Ribeiro.

Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)

PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
LUSOlex: 65k root forms (morphology + gramcat)
BRASILex: 68k root forms (morphology + gramcat)
Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
DicPro: 6.2k anthroponyms
SMorph: 26k root forms (morphology + inflection paradigm)
EPLexIC: 80k word forms (morphology + pronunciation); in construction
ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
Broadcast News: 64k entries (pronunciation)

Corpora

10:10 - Presentation by Paula Cristina Vaz.

Information available at http://corpora.l2f.inesc-id.pt/ (intranet)

CETENFolha: 24Mwords (newspaper corpus)
CETEMPúblico: 180 Mwords (newspaper corpus)
CHInf: 100 children stories (books)
PAROLE: ~20 Mwords
Newspapers: 10 daily newspapers; ~600 Mwords

Coffee Break and Welcome Reception

10:30 - Welcome reception by General Carlos Carvalho dos Reis.

Spoken Language Corpora

11:00 - Presentation by Rui Amaral.

Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
List of presented corpora:

EUROM.1
BDFALA: newspapers and TV debates
SPEECHDAT
CORAL
ALERT-ASR
ALERT-TD: TV broadcast news
IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights
LECTRA: classroom lectures (pilot corpus); two semesters (under construction)
PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice)

Simple Text Processing Tools

11:20 - Presentation by Fernando Batista.

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

Morphological analysis
- SMorph - POS tagger, tokenizer, generator
- Palavroso - POS tagger, tokenizer, generator
- Amorfo/XA - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction)

Morphological generation
- Monge - general form generator; language- and tag-independent; uses LRDB (under development; usable)
- Gover - verb gnerator (~10k manually corrected verbs)

Morpho-syntax processing
- PAsMo - rule-based rewriter
- MARv - morpho-syntactic disambiguation

Syntactic analysis
- SuSAna - Surface syntax analyzer
- ParVO - syntactic analyzer (Earley algorithm; variable unification; O(n³))

Syntax-Semantics interface
- Algas - arrowing construction
- AsDeCopas -

Other tools
- text2syl - silabification
- num2ext - text normalizer
- YAH - (yet another) hyphenator (rule-based); MS Office compatible
- Correcto - spell checker; MS Office compatible
- leia - grapheme-2-phone converter (normalizer)

General purpose
- FSTK lib - finite-state transduce toolkit

Speech Synthesis Tools

11:40 - Presentation by Sérgio Paulo.

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

EmoVoice - transformation of speech-based emotions
L2F_MuLA - multi-level speech aligner and annotator
L2F_PhoneAlign - phonetic aligner
dixi-tok2wrd - normalizer

Speech Recognition Tools

12:00 - Presentation by Hugo Meinedo.

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

AUDIMUS - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform

Brainstorming Presentation

12:20 - Presentation by Luís Caldas de Oliveira
Ideas for new projects: 4~5 surviving ideas to be detailed/discussed in the afternoon session

Lunch Break

12:30 - Lunch at Paço da Rainha
14:30 - Visit to the Academia Buildings: museum, library, council chamber, chapel

Brainstorming

15:00 - Moderated by Luís Caldas de Oliveira

Coffee Break

16:45

Analysis

17:00 - Moderated by Luís Caldas de Oliveira

Final Remarks

18:15 - Opportunities for new people: scholarships; post-graduate studies.

@@ Line 1: / Line 1: @@
-== Integrated Tools and Ontologies ==
+{| align='right' style='border-style: solid; border-width: 1px; background: #f7f8ff;'
+! style='text-align: center; border-style: solid; border-width: 0px; border-bottom-width: 1px;' | [[Image:logo-academia-militar.gif]]
+|-
+! style='text-align: center;' | [http://www.academiamilitar.pt/ Academia Militar]
+|}
-Presentation by [[Joana Paulo]].
+L²F Day 2006 took place at the [http://www.academiamilitar.pt Military Academy] in Lisbon.
-Information available at http://l2f.l2f.inesc-id.pt/
+== Integrated Tools and Ontologies ==
-* Integrated tools:
+* '''09:30''' - Presentation by [[Joana Paulo]].
-** ATA
-** JaVaLi!
-** DID
-** SAF
-* 3<sup>rd</sup> Party
+Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
-** Intex
-* Ontologies:
+* Integrated tools: '''ATA'''; '''JaVaLi!'''; '''DID'''; '''SAF'''; '''Intex''' (3<sup>rd</sup> party)
-** OntoWine (wine domain ontology)
+* Ontologies: '''OntoWine''' (wine domain ontology); '''OntoChef''' (cooking domain ontology)
-** OntoChef (cooking domain ontology)
 == Lexicons ==
-Presentation by [[Ricardo Daniel Ribeiro]].
+* '''09:50''' - Presentation by [[Ricardo Daniel Ribeiro]].
-Information available at http://lrdb.l2f.inesc-id.pt/
+Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
-* PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
+* '''PAROLE'''/'''SIMPLE''': 20k root forms + inflection paradigms (morphology + syntax + semantics)
-* LUSOlex: 65k root forms (morphology + gramcat)
+* '''LUSOlex''': 65k root forms (morphology + gramcat)
-* BRASILex: 68k root forms (morphology + gramcat)
+* '''BRASILex''': 68k root forms (morphology + gramcat)
 * Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
-* DicPro: 6.2k anthroponyms
+* '''DicPro''': 6.2k anthroponyms
-* SMorph: 26k root forms (morphology + inflection paradigm)
+* '''SMorph''': 26k root forms (morphology + inflection paradigm)
-* EPLexIC: 80k word forms (morphology + pronunciation); in construction
+* '''EPLexIC''': 80k word forms (morphology + pronunciation); in construction
-* ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
+* '''ONOMASTICA''': 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
-* Broadcast News: 64k entries (pronunciation)
+* '''Broadcast News''': 64k entries (pronunciation)
+== Corpora ==
+* '''10:10''' - Presentation by [[Paula Cristina Vaz]].
+Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
+* '''CETENFolha''': 24Mwords (newspaper corpus)
+* '''CETEMPúblico''': 180 Mwords (newspaper corpus)
+* '''CHInf''': 100 children stories (books)
+* '''PAROLE''': ~20 Mwords
+* Newspapers: 10 daily newspapers; ~600 Mwords
+== Coffee Break and Welcome Reception ==
+* '''10:30''' - Welcome reception by General Carlos Carvalho dos Reis.
+== Spoken Language Corpora ==
+* '''11:00''' - Presentation by [[Rui Amaral]].
+Information available at http://corpora.l2f.inesc-id.pt/ (intranet)<br/>
+List of presented corpora:
+* '''EUROM.1'''
+* '''BDFALA''': newspapers and TV debates
+* '''SPEECHDAT'''
+* '''CORAL'''
+* '''ALERT-ASR'''
+* '''ALERT-TD''': TV broadcast news
+* '''IPSOM''': six spoken books (read by professionals); discussion regarding publication and distribution rights
+* '''LECTRA''':  classroom lectures (pilot corpus); two semesters (under construction)
+* '''PAPOUS''': corpus of children stories performed by António Rito Silva (falsetto voice)
+== Simple Text Processing Tools ==
+* '''11:20''' - Presentation by [[Fernando Batista]].
+Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
+* Morphological analysis
+** '''SMorph''' - POS tagger, tokenizer, generator
+** '''Palavroso''' - POS tagger, tokenizer, generator
+** '''Amorfo'''/'''XA''' - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction)
+* Morphological generation
+** '''Monge''' - general form generator; language- and tag-independent; uses LRDB (under development; usable)
+** '''Gover''' - verb gnerator (~10k manually corrected verbs)
+* Morpho-syntax processing
+** '''PAsMo''' - rule-based rewriter
+** '''MARv''' - morpho-syntactic disambiguation
+* Syntactic analysis
+** '''SuSAna''' - Surface syntax analyzer
+** '''ParVO''' - syntactic analyzer (Earley algorithm; variable unification; O(n³))
+* Syntax-Semantics interface
+** '''Algas''' - arrowing construction
+** '''AsDeCopas''' -
+* Other tools
+** '''text2syl''' - silabification
+** '''num2ext''' - text normalizer
+** '''YAH''' - ''(yet another) hyphenator'' (rule-based); MS Office compatible
+** '''Correcto''' - spell checker; MS Office compatible
+** '''leia''' - grapheme-2-phone converter (normalizer)
+* General purpose
+** '''FSTK lib''' - finite-state transduce toolkit
+== Speech Synthesis Tools ==
+* '''11:40''' - Presentation by [[Sérgio Paulo]].
+Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
+* '''EmoVoice''' - transformation of speech-based emotions
+* '''L2F_MuLA''' - multi-level speech aligner and annotator
+* '''L2F_PhoneAlign''' - phonetic aligner
+* '''dixi-tok2wrd''' - normalizer
+== Speech Recognition Tools ==
+* '''12:00''' - Presentation by [[Hugo Meinedo]].
+Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
+* '''AUDIMUS''' - ASR with well-documented API; '''AUDIMUS.linux''' (frozen; discontinued; no longer supported; usable); '''AUDIMUS.cvs''' (usable; development version); MS Office integration; multi-platform
+== Brainstorming Presentation ==
+* '''12:20''' - Presentation by [[Luís Caldas de Oliveira]]
+* Ideas for new projects: 4~5 surviving ideas to be detailed/discussed in the afternoon session
+== Lunch Break ==
+* '''12:30''' - Lunch at Paço da Rainha
+* '''14:30''' - Visit to the Academia Buildings: museum, library, council chamber, chapel
+== Brainstorming ==
+* '''15:00''' - Moderated by [[Luís Caldas de Oliveira]]
+== Coffee Break ==
+* '''16:45'''
+== Analysis ==
+* '''17:00''' - Moderated by [[Luís Caldas de Oliveira]]
+== Final Remarks ==
+* '''18:15''' - Opportunities for new people: scholarships; post-graduate studies.
+[[category:Seminars]]
+[[category:Seminars 2006]]
+[[category:L²F Day]]

: Difference between revisions

From HLT@INESC-ID

Latest revision as of 19:33, 7 July 2006

Contents

Integrated Tools and Ontologies

Lexicons

Corpora

Coffee Break and Welcome Reception

Spoken Language Corpora

Simple Text Processing Tools

Speech Synthesis Tools

Speech Recognition Tools

Brainstorming Presentation

Lunch Break

Brainstorming

Coffee Break

Analysis

Final Remarks