About Us

From HLT@INESC-ID

Revision as of 17:17, 20 March 2006 by David (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Mission

L2F was formed in 2001. According to our initial mission statement, we wanted to bring together several research groups which could potentially add relevant contributions to the area of computational processing of spoken language for European Portuguese. We are currently pursuing different lines of activity towards the goal of creating technology to bridge the gap between natural spoken language and the underlying semantic information.

The two lines of activity which the group has considered as top priorities in the past two years, given their relevance and interdisciplinary nature, in terms of integrating several core technologies are: spoken dialogue systems platforms and semantic processing of multimedia contents.

A related topic is computer enhanced human-to-human communication, an area which encompasses for instance automatic transcription of meetings, lectures, courtrooms, etc. with the tough challenge of providing rich transcriptions for spontaneous speech.

Another emerging line of activity is "speech-to-speech" translation, which in itself is the area which encompasses more core technologies in the group. Our long term plan is to incorporate morphology, syntax and semantics into statistical machine translation.

We also plan to continue our work of extending our technology to other varieties of Portuguese and our involvement in two other areas: e-learning and e-inclusion, namely in the development of alternative and augmentative communication tools for people with special needs.

Overview

INESC - Instituto de Engenharia de Sistemas e Computadores (Institute for Systems and Computer Engineering) is a private non-profit institution, dedicated to research, development and teaching in advanced technological areas. It was created in 1980 to become an interface between the Telecommunications and Information Technology sectors and the Portuguese University system, and is owned by both. In January 2000, a major restructuring of INESC took place, creating a holding company with several affiliates, among which INESC ID Lisbon (Research & Development in Lisbon).

The Spoken Language Systems Lab (L2F - Laboratório de sistemas de Língua Falada) was created in January 2001, as part of this center in Lisbon, in order to bring together several research groups which could potentially add relevant contributions to the area of computational processing of spoken language for European Portuguese: the former speech processing group of INESC, some speech researchers originally from the neural networks and natural language processing groups of INESC, and the language engineering group of CSTC.

L2F is actively involved in many areas of spoken language systems, namely recognition, synthesis and coding. Its work has been internationally recognized through close cooperation with other research centers in Europe and in the U.S., and a significant involvement in European Projects (15).

At a national level, L2F has also been actively involved in several research projects. Some important landmarks in this activity were the development of the first Text-to-Speech synthesizer build from scratch for European Portuguese (DIXI) in 1991, and of the first version of our Large Vocabulary Continuous Speech Recognition system (AUDIMUS) for our language in 1997. Of paramount importance to this activity is the formal cooperation agreement with the Center of Linguistics of the University of Lisbon (CLUL), established early in the 90s.

A major outcome of this interdisciplinary cooperation was the creation of spoken linguistic resources for European Portuguese: corpora EUROM.1, BDFALA, SPEECHDAT, BD-PUBLICO and CORAL, and corresponding pronunciation lexica, plus the ONOMASTICA proper names lexicon.

By bringing together researchers from the area of natural language processing processing, the lab acquired expertise in areas such as natural language database interfaces, natural language generation, alternative syntactic and semantic processing paradigms, etc., which are also extremely relevant to the development of spoken language systems.

Altogether, the newly formed lab includes around 20 researchers, most of them either Professors or graduate students at the neighbouring engineering university (Instituto Superior Tecnico).