Difference between revisions of "L²F Day 2006"

From HLT@INESC-ID

 
(9 intermediate revisions by one other user not shown)
Line 4: Line 4:
 
! style='text-align: center;' | [http://www.academiamilitar.pt/ Academia Militar]
 
! style='text-align: center;' | [http://www.academiamilitar.pt/ Academia Militar]
 
|}
 
|}
 +
 +
L²F Day 2006 took place at the [http://www.academiamilitar.pt Military Academy] in Lisbon.
  
 
== Integrated Tools and Ontologies ==
 
== Integrated Tools and Ontologies ==
  
* 09:30 - Presentation by [[Joana Paulo]].
+
* '''09:30''' - Presentation by [[Joana Paulo]].
  
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
  
* Integrated tools:
+
* Integrated tools: '''ATA'''; '''JaVaLi!'''; '''DID'''; '''SAF'''; '''Intex''' (3<sup>rd</sup> party)
** ATA
+
* Ontologies: '''OntoWine''' (wine domain ontology); '''OntoChef''' (cooking domain ontology)
** JaVaLi!
+
** DID
+
** SAF
+
** Intex (3<sup>rd</sup> party)
+
 
+
* Ontologies:
+
** OntoWine (wine domain ontology)
+
** OntoChef (cooking domain ontology)
+
  
 
== Lexicons ==
 
== Lexicons ==
  
* 09:50 - Presentation by [[Ricardo Daniel Ribeiro]].
+
* '''09:50''' - Presentation by [[Ricardo Daniel Ribeiro]].
  
 
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
 
Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)
  
* PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
+
* '''PAROLE'''/'''SIMPLE''': 20k root forms + inflection paradigms (morphology + syntax + semantics)
* LUSOlex: 65k root forms (morphology + gramcat)
+
* '''LUSOlex''': 65k root forms (morphology + gramcat)
* BRASILex: 68k root forms (morphology + gramcat)
+
* '''BRASILex''': 68k root forms (morphology + gramcat)
 
* Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
 
* Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
* DicPro: 6.2k anthroponyms
+
* '''DicPro''': 6.2k anthroponyms
* SMorph: 26k root forms (morphology + inflection paradigm)
+
* '''SMorph''': 26k root forms (morphology + inflection paradigm)
* EPLexIC: 80k word forms (morphology + pronunciation); in construction
+
* '''EPLexIC''': 80k word forms (morphology + pronunciation); in construction
* ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
+
* '''ONOMASTICA''': 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
* Broadcast News: 64k entries (pronunciation)
+
* '''Broadcast News''': 64k entries (pronunciation)
  
 
== Corpora ==
 
== Corpora ==
  
* 10:10 - Presentation by [[Paula Cristina Vaz]].
+
* '''10:10''' - Presentation by [[Paula Cristina Vaz]].
  
 
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
 
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
  
* CETENFolha: 24Mwords (newspaper corpus)
+
* '''CETENFolha''': 24Mwords (newspaper corpus)
* CETEMPúblico: 180 Mwords (newspaper corpus)
+
* '''CETEMPúblico''': 180 Mwords (newspaper corpus)
* CHInf: 100 children stories (books)
+
* '''CHInf''': 100 children stories (books)
 +
* '''PAROLE''': ~20 Mwords
 
* Newspapers: 10 daily newspapers; ~600 Mwords
 
* Newspapers: 10 daily newspapers; ~600 Mwords
* PAROLE: ~20 Mwords
 
  
 
== Coffee Break and Welcome Reception ==
 
== Coffee Break and Welcome Reception ==
  
* 10:30 - Welcome reception by General Carlos Carvalho dos Reis.
+
* '''10:30''' - Welcome reception by General Carlos Carvalho dos Reis.
  
 
== Spoken Language Corpora ==
 
== Spoken Language Corpora ==
  
* 11:00 - Presentation by [[Rui Amaral]].
+
* '''11:00''' - Presentation by [[Rui Amaral]].
  
 
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)<br/>
 
Information available at http://corpora.l2f.inesc-id.pt/ (intranet)<br/>
 
List of presented corpora:
 
List of presented corpora:
  
* EUROM.1
+
* '''EUROM.1'''
* BDFALA: newspapers and TV debates
+
* '''BDFALA''': newspapers and TV debates
* SPEECHDAT
+
* '''SPEECHDAT'''
* CORAL
+
* '''CORAL'''
* ALERT-ASR
+
* '''ALERT-ASR'''
* ALERT-TD: TV broadcast news
+
* '''ALERT-TD''': TV broadcast news
* IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights
+
* '''IPSOM''': six spoken books (read by professionals); discussion regarding publication and distribution rights
* LECTRA:  classroom lectures (pilot corpus); two semesters (under construction)
+
* '''LECTRA''':  classroom lectures (pilot corpus); two semesters (under construction)
* PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice)
+
* '''PAPOUS''': corpus of children stories performed by António Rito Silva (falsetto voice)
  
 
== Simple Text Processing Tools ==
 
== Simple Text Processing Tools ==
  
* 11:20 - Presentation by [[Fernando Batista]].
+
* '''11:20''' - Presentation by [[Fernando Batista]].
  
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
  
 
* Morphological analysis
 
* Morphological analysis
** SMorph - POS tagger, tokenizer, generator
+
** '''SMorph''' - POS tagger, tokenizer, generator
** Palavroso - POS tagger, tokenizer, generator
+
** '''Palavroso''' - POS tagger, tokenizer, generator
** Amorfo/XA - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction)
+
** '''Amorfo'''/'''XA''' - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction)
  
 
* Morphological generation
 
* Morphological generation
** Monge - general form generator; language- and tag-independent; uses LRDB (under development; usable)
+
** '''Monge''' - general form generator; language- and tag-independent; uses LRDB (under development; usable)
** Gover - verb gnerator (~10k manually corrected verbs)
+
** '''Gover''' - verb gnerator (~10k manually corrected verbs)
  
 
* Morpho-syntax processing
 
* Morpho-syntax processing
** PAsMo - rule-based rewriter
+
** '''PAsMo''' - rule-based rewriter
** MARv - morpho-syntactic disambiguation
+
** '''MARv''' - morpho-syntactic disambiguation
  
 
* Syntactic analysis
 
* Syntactic analysis
** SuSAna -  
+
** '''SuSAna''' - Surface syntax analyzer
** ParVO - syntactic analyzer (Earley algorithm; variable unification; O(n³))
+
** '''ParVO''' - syntactic analyzer (Earley algorithm; variable unification; O(n³))
  
 
* Syntax-Semantics interface
 
* Syntax-Semantics interface
** Algas - arrowing construction
+
** '''Algas''' - arrowing construction
** AsDeCopas -  
+
** '''AsDeCopas''' -  
  
 
* Other tools
 
* Other tools
** text2syl - silabification
+
** '''text2syl''' - silabification
** num2ext - text normalizer
+
** '''num2ext''' - text normalizer
** YAH - (yet another) hyphenator (rule-based); MS Office compatible
+
** '''YAH''' - ''(yet another) hyphenator'' (rule-based); MS Office compatible
** Correcto - spell checker; MS Office compatible
+
** '''Correcto''' - spell checker; MS Office compatible
** leia - grapheme-2-phone converter (normalizer)
+
** '''leia''' - grapheme-2-phone converter (normalizer)
  
 
* General purpose
 
* General purpose
** FSTK lib - finite-state transduce toolkit
+
** '''FSTK lib''' - finite-state transduce toolkit
  
 
== Speech Synthesis Tools ==
 
== Speech Synthesis Tools ==
  
* 11:40 - Presentation by [[Sérgio Paulo]].
+
* '''11:40''' - Presentation by [[Sérgio Paulo]].
  
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
  
* EmoVoice - transformation of speech-based emotions
+
* '''EmoVoice''' - transformation of speech-based emotions
* L2F_MuLA - multi-level speech aligner and annotator
+
* '''L2F_MuLA''' - multi-level speech aligner and annotator
* L2F_PhoneAlign - phonetic aligner
+
* '''L2F_PhoneAlign''' - phonetic aligner
* dixi-tok2wrd - normalizer
+
* '''dixi-tok2wrd''' - normalizer
  
 
== Speech Recognition Tools ==
 
== Speech Recognition Tools ==
  
* 12:00 - Presentation by [[Hugo Meinedo]].
+
* '''12:00''' - Presentation by [[Hugo Meinedo]].
  
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
 
Information available at http://l2f.l2f.inesc-id.pt/ (intranet)
  
* AUDIMUS - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform
+
* '''AUDIMUS''' - ASR with well-documented API; '''AUDIMUS.linux''' (frozen; discontinued; no longer supported; usable); '''AUDIMUS.cvs''' (usable; development version); MS Office integration; multi-platform
 +
 
 +
== Brainstorming Presentation ==
 +
 
 +
* '''12:20''' - Presentation by [[Luís Caldas de Oliveira]]
 +
* Ideas for new projects: 4~5 surviving ideas to be detailed/discussed in the afternoon session
 +
 
 +
== Lunch Break ==
 +
 
 +
* '''12:30''' - Lunch at Paço da Rainha
 +
* '''14:30''' - Visit to the Academia Buildings: museum, library, council chamber, chapel
 +
 
 +
== Brainstorming ==
 +
 
 +
* '''15:00''' - Moderated by [[Luís Caldas de Oliveira]]
 +
 
 +
== Coffee Break ==
 +
 
 +
* '''16:45'''
 +
 
 +
== Analysis ==
 +
 
 +
* '''17:00''' - Moderated by [[Luís Caldas de Oliveira]]
  
== Brain-Stormming Presentation ==
+
== Final Remarks ==
  
* 12:20 - Presentation by [[Luís Caldas de Oliveira]]
+
* '''18:15''' - Opportunities for new people: scholarships; post-graduate studies.
  
* Ideas for new projects: 4~5 ideas to be detailed/discussed in the afternoon session
+
[[category:Seminars]]
 +
[[category:Seminars 2006]]
 +
[[category:L²F Day]]

Latest revision as of 19:33, 7 July 2006

Logo-academia-militar.gif
Academia Militar

L²F Day 2006 took place at the Military Academy in Lisbon.

Integrated Tools and Ontologies

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

  • Integrated tools: ATA; JaVaLi!; DID; SAF; Intex (3rd party)
  • Ontologies: OntoWine (wine domain ontology); OntoChef (cooking domain ontology)

Lexicons

Information available at http://lrdb.l2f.inesc-id.pt/ (intranet)

  • PAROLE/SIMPLE: 20k root forms + inflection paradigms (morphology + syntax + semantics)
  • LUSOlex: 65k root forms (morphology + gramcat)
  • BRASILex: 68k root forms (morphology + gramcat)
  • Integração do LUSOlex + EPLexIC: ~8-10x EPLexIC phonetic forms
  • DicPro: 6.2k anthroponyms
  • SMorph: 26k root forms (morphology + inflection paradigm)
  • EPLexIC: 80k word forms (morphology + pronunciation); in construction
  • ONOMASTICA: 85k proper names (people, streets, cities, companies); 11 languages and cross-lingual information; pronunciation
  • Broadcast News: 64k entries (pronunciation)

Corpora

Information available at http://corpora.l2f.inesc-id.pt/ (intranet)

  • CETENFolha: 24Mwords (newspaper corpus)
  • CETEMPúblico: 180 Mwords (newspaper corpus)
  • CHInf: 100 children stories (books)
  • PAROLE: ~20 Mwords
  • Newspapers: 10 daily newspapers; ~600 Mwords

Coffee Break and Welcome Reception

  • 10:30 - Welcome reception by General Carlos Carvalho dos Reis.

Spoken Language Corpora

Information available at http://corpora.l2f.inesc-id.pt/ (intranet)
List of presented corpora:

  • EUROM.1
  • BDFALA: newspapers and TV debates
  • SPEECHDAT
  • CORAL
  • ALERT-ASR
  • ALERT-TD: TV broadcast news
  • IPSOM: six spoken books (read by professionals); discussion regarding publication and distribution rights
  • LECTRA: classroom lectures (pilot corpus); two semesters (under construction)
  • PAPOUS: corpus of children stories performed by António Rito Silva (falsetto voice)

Simple Text Processing Tools

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

  • Morphological analysis
    • SMorph - POS tagger, tokenizer, generator
    • Palavroso - POS tagger, tokenizer, generator
    • Amorfo/XA - POS tagger, tokenizer, simultaneous multi-lingual analysis; error-correction (spelling correction)
  • Morphological generation
    • Monge - general form generator; language- and tag-independent; uses LRDB (under development; usable)
    • Gover - verb gnerator (~10k manually corrected verbs)
  • Morpho-syntax processing
    • PAsMo - rule-based rewriter
    • MARv - morpho-syntactic disambiguation
  • Syntactic analysis
    • SuSAna - Surface syntax analyzer
    • ParVO - syntactic analyzer (Earley algorithm; variable unification; O(n³))
  • Syntax-Semantics interface
    • Algas - arrowing construction
    • AsDeCopas -
  • Other tools
    • text2syl - silabification
    • num2ext - text normalizer
    • YAH - (yet another) hyphenator (rule-based); MS Office compatible
    • Correcto - spell checker; MS Office compatible
    • leia - grapheme-2-phone converter (normalizer)
  • General purpose
    • FSTK lib - finite-state transduce toolkit

Speech Synthesis Tools

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

  • EmoVoice - transformation of speech-based emotions
  • L2F_MuLA - multi-level speech aligner and annotator
  • L2F_PhoneAlign - phonetic aligner
  • dixi-tok2wrd - normalizer

Speech Recognition Tools

Information available at http://l2f.l2f.inesc-id.pt/ (intranet)

  • AUDIMUS - ASR with well-documented API; AUDIMUS.linux (frozen; discontinued; no longer supported; usable); AUDIMUS.cvs (usable; development version); MS Office integration; multi-platform

Brainstorming Presentation

  • 12:20 - Presentation by Luís Caldas de Oliveira
  • Ideas for new projects: 4~5 surviving ideas to be detailed/discussed in the afternoon session

Lunch Break

  • 12:30 - Lunch at Paço da Rainha
  • 14:30 - Visit to the Academia Buildings: museum, library, council chamber, chapel

Brainstorming

Coffee Break

  • 16:45

Analysis

Final Remarks

  • 18:15 - Opportunities for new people: scholarships; post-graduate studies.