About Us: Difference between revisions

From HLT@INESC-ID

No edit summary
 
No edit summary
(24 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Mission ==
[[Image:inescid-2020.png|right|300px]]


L2F was formed in 2001. According to our initial mission statement, we wanted to bring together several research groups which could potentially add relevant contributions to the area of computational processing of spoken language for European Portuguese. We are currently pursuing different lines of activity towards the goal of '''creating technology to bridge the gap between natural spoken language and the underlying semantic information'''.
[https://www.inesc-id.pt INESC-ID] is an R&D institute dedicated to advanced research and development in the fields of Information Technologies, Electronics, Communications, and Energy. It is a non-profit institution, privately owned by [https://www.tecnico.ulisboa.pt IST] and [http://www.inesc.pt/ INESC], officially declared of public interest. It was created in 2000, as a result of a reorganization of INESC, an institution which also played a pioneering role at the national level both in research and in the creation of SMEs. In 2004, INESC-ID was recognized by the Portuguese Government as an "Associated Laboratory".


The two lines of activity which the group has considered as top priorities in the past two years, given their relevance and interdisciplinary nature, in terms of integrating several core technologies are: '''[[Multimodal Dialogue Systems|spoken dialogue systems platforms]]''' and '''[[Semantic Processing of Multimedia Contents|semantic processing of multimedia contents]]'''.  
INESC-ID incorporates a body of highly qualified researchers, including more than 70 researchers holding a PhD and numerous post-graduate students. The majority of the PhD researchers are university professors. This body of researchers enables INESC-ID to act in the different phases of the R&D process. The intense activity developed by INESC-ID resulted in a large number of scientific papers published in specialized journals and international conferences, industrial prototypes, and computer systems based on state-of-art technologies, as well as a number of patents and awards.


A related topic is computer enhanced human-to-human communication, an area which encompasses for instance automatic transcription of meetings, lectures, courtrooms, etc. with the tough challenge of providing rich transcriptions for spontaneous speech.  
INESC-ID operates in the following scientific areas: AI for People & Society, Automated Reasoning & Software Reliability, Communication Networks, Distributed, Parallel & Secure Systems, Green Energy & Smart Converters, Graphics & Interaction, Human Language Technologies, High Performance Computing, Architectures & Systems, Information & Decision Support Systems, Nano-Electronic Circuits & Systems, and Sustainable Power Systems.


Another emerging line of activity is "speech-to-speech" translation, which in itself is the area which encompasses more core technologies in the group. Our long term plan is to incorporate morphology, syntax and semantics into statistical machine translation.  
The Human Language Technology laboratory, as a congregation of people working on human language technology and related matters, is the natural evolution of the Spoken Language Systems Laboratory (L²F), a laboratory created in 2001, to bring together several research groups of INESC and independent researchers to contribute to the area of computational processing of spoken language for European Portuguese. The long term goal of L²F was to bridge the gap between natural spoken language and the underlying semantic information.


We also plan to continue our work of extending our technology to other varieties of Portuguese and our involvement in two other areas: e-learning and e-inclusion, namely in the development of alternative and augmentative communication tools for people with special needs.
The HLT laboratory includes researchers/faculty from various universities, as well as graduate and post-graduate researchers. Their background ranges from Electrical Engineering, to Computer Science, and Linguistics. This strongly interdisciplinary group is actively involved in many areas of language research and development,  both spoken and written, including, without limitation, speech recognition, speech synthesis, speech coding, speech understanding, audio indexing, multimodal dialogue systems, language and dialect identification, speech-to-speech machine translation, automatic summarization, natural language database interfaces, natural language generation, text simplification, named entity recognition, question answering, among others and in no particular order.
<!--
The expertise in these core technologies is integrated into 3 main research strands: semantic processing of multimedia contents, spoken/multimodal dialogue systems and speech-to-speech translation. The original focus on European Portuguese has now been broadened to encompass all varieties of Portuguese. Two application areas also deserve our special attention: e-learning and e-inclusion, namely in the development of alternative and augmentative communication tools for people with special needs.


== Overview ==
Some important landmarks are the development of the first Text-to-Speech synthesizer build from scratch for European Portuguese (DIXI) in 1991, and of the first version of our Large Vocabulary Continuous Speech Recognition system (AUDIMUS) for our language in 1997.


[http://www.inesc.pt/ INESC - Instituto de Engenharia de Sistemas e Computadores] (Institute for Systems and Computer Engineering) is a private non-profit institution, dedicated to research, development and teaching in advanced technological areas. It was created in 1980 to become an interface between the Telecommunications and Information Technology sectors and the Portuguese University system, and is owned by both. In January 2000, a major restructuring of INESC took place, creating a holding company with several affiliates, among which [http://www.inesc-id.pt/ INESC ID Lisbon] (Research & Development in Lisbon).  
L²F is an institutional member of [http://www.isca-speech.org/ ISCA] (International Speech Communication Association). Its work has been internationally recognized through close cooperation with other research centers in Europe and in the United States, as well as through the recent organization of the conference [http://www.interspeech2005.org INTERSPEECH'2005].


The Spoken Language Systems Lab (L2F - Laboratório de sistemas de Língua Falada) was created in January 2001, as part of this center in Lisbon, in order to bring together several research groups which could potentially add relevant contributions to the area of computational processing of spoken language for European Portuguese: the former speech processing group of INESC, some speech researchers originally from the neural networks and natural language processing groups of INESC, and the language engineering group of CSTC.
L²F has actively cooperated with the national industry: Vodafone, Portugal Telecom, Microsoft Portugal, Rádio-Televisão Portuguesa (RTP), Porto Editora, Texto Editora.
 
-->
L2F is actively involved in many areas of spoken language systems, namely recognition, synthesis and coding. Its work has been internationally recognized through close cooperation with other research centers in Europe and in the U.S., and a significant involvement in European Projects (15).
 
At a national level, L2F has also been actively involved in several research projects. Some important landmarks in this activity were the development of the first Text-to-Speech synthesizer build from scratch for European Portuguese (DIXI) in 1991, and of the first version of our Large Vocabulary Continuous Speech Recognition system (AUDIMUS) for our language in 1997. Of paramount importance to this activity is the formal cooperation agreement with the Center of Linguistics of the University of Lisbon (CLUL), established early in the 90s.  
 
A major outcome of this interdisciplinary cooperation was the creation of spoken linguistic resources for European Portuguese: corpora EUROM.1, BDFALA, SPEECHDAT, BD-PUBLICO and CORAL, and corresponding pronunciation lexica, plus the ONOMASTICA proper names lexicon.
 
By bringing together researchers from the area of natural language processing processing, the lab acquired expertise in areas such as natural language database interfaces, natural language generation, alternative syntactic and semantic processing paradigms, etc., which are also extremely relevant to the development of spoken language systems.
 
Altogether, the newly formed lab includes around 20 researchers, most of them either Professors or graduate students at the neighbouring engineering university (Instituto Superior Tecnico).

Revision as of 08:39, 11 February 2020

INESC-ID is an R&D institute dedicated to advanced research and development in the fields of Information Technologies, Electronics, Communications, and Energy. It is a non-profit institution, privately owned by IST and INESC, officially declared of public interest. It was created in 2000, as a result of a reorganization of INESC, an institution which also played a pioneering role at the national level both in research and in the creation of SMEs. In 2004, INESC-ID was recognized by the Portuguese Government as an "Associated Laboratory".

INESC-ID incorporates a body of highly qualified researchers, including more than 70 researchers holding a PhD and numerous post-graduate students. The majority of the PhD researchers are university professors. This body of researchers enables INESC-ID to act in the different phases of the R&D process. The intense activity developed by INESC-ID resulted in a large number of scientific papers published in specialized journals and international conferences, industrial prototypes, and computer systems based on state-of-art technologies, as well as a number of patents and awards.

INESC-ID operates in the following scientific areas: AI for People & Society, Automated Reasoning & Software Reliability, Communication Networks, Distributed, Parallel & Secure Systems, Green Energy & Smart Converters, Graphics & Interaction, Human Language Technologies, High Performance Computing, Architectures & Systems, Information & Decision Support Systems, Nano-Electronic Circuits & Systems, and Sustainable Power Systems.

The Human Language Technology laboratory, as a congregation of people working on human language technology and related matters, is the natural evolution of the Spoken Language Systems Laboratory (L²F), a laboratory created in 2001, to bring together several research groups of INESC and independent researchers to contribute to the area of computational processing of spoken language for European Portuguese. The long term goal of L²F was to bridge the gap between natural spoken language and the underlying semantic information.

The HLT laboratory includes researchers/faculty from various universities, as well as graduate and post-graduate researchers. Their background ranges from Electrical Engineering, to Computer Science, and Linguistics. This strongly interdisciplinary group is actively involved in many areas of language research and development, both spoken and written, including, without limitation, speech recognition, speech synthesis, speech coding, speech understanding, audio indexing, multimodal dialogue systems, language and dialect identification, speech-to-speech machine translation, automatic summarization, natural language database interfaces, natural language generation, text simplification, named entity recognition, question answering, among others and in no particular order.