About Us: Difference between revisions
From HLT@INESC-ID
| No edit summary | No edit summary | ||
| (25 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
| [[Image:inescid-2020.png|right|300px]] | |||
| INESC-ID,  | [https://www.inesc-id.pt INESC-ID] is an R&D institute dedicated to advanced research and development in the fields of Information Technologies, Electronics, Communications, and Energy. It is a non-profit institution, privately owned by [https://www.tecnico.ulisboa.pt IST] and [http://www.inesc.pt/ INESC], officially declared of public interest. It was created in 2000, as a result of a reorganization of INESC, an institution which also played a pioneering role at the national level both in research and in the creation of SMEs. In 2004, INESC-ID was recognized by the Portuguese Government as an "Associated Laboratory". | ||
| INESC-ID incorporates a body of highly qualified researchers, including more than 70 researchers holding a PhD and numerous post-graduate students. The majority of the PhD researchers are  | INESC-ID incorporates a body of highly qualified researchers, including more than 70 researchers holding a PhD and numerous post-graduate students. The majority of the PhD researchers are university professors. This body of researchers enables INESC-ID to act in the different phases of the R&D process. The intense activity developed by INESC-ID resulted in a large number of scientific papers published in specialized journals and international conferences, industrial prototypes, and computer systems based on state-of-art technologies, as well as a number of patents and awards. | ||
| INESC-ID  | INESC-ID operates in the following scientific areas: AI for People & Society, Automated Reasoning & Software Reliability, Communication Networks, Distributed, Parallel & Secure Systems, Green Energy & Smart Converters, Graphics & Interaction, Human Language Technology, High Performance Computing, Architectures & Systems, Information & Decision Support Systems, Nano-Electronic Circuits & Systems, and Sustainable Power Systems. | ||
| The  | The Human Language Technology (HLT) group is the successor of the Spoken Language Systems Laboratory (L²F), which was created in 2001. L²F brought together several groups of INESC and also independent researchers, aiming to advance the field of computational processing of spoken language, particularly for European Portuguese. The long term goal of L²F was to bridge the gap between natural spoken language and the underlying semantic information. The HLT group has expanded on the mission from L²F, considering a broader set of activities related to spoken and written language technologies, and building bridges with other disciplines. | ||
| The  | The HLT group currently includes researchers/faculty from various universities, as well as over 30 doctoral students and post-graduate researchers. The group is highly interdisciplinary, featuring researchers whose background ranges from Electrical Engineering to Computer Science and Linguistics. The group members are actively involved in research and development activities related to many areas of spoken and written language processing, including speech coding, speech recognition and synthesis, multimodal dialogue systems and natural language interfaces, text and audio indexing and retrieval, language and dialect identification, text simplification and paraphrasing, automatic summarization, question answering, named entity recognition and information extraction, or machine translation from text and speech, among other topics and in no particular order. The group’s involvement in these core technologies can be seen not only by its publications and open domain toolkits (e.g. STRING), but also by its successful participation in joint evaluation campaigns such as in the recent challenges on machine translation quality estimation, or multilingual automatic evaluation metrics for open-domain dialogue systems. | ||
| Combining speech and language technologies with other disciplines has been one of the highlights of the group's work, particularly on the use of speech as a health biomarker, privacy-preserving speech processing, geographical text analysis and retrieval, music information retrieval, and also multimodal work on vision and language problems such as image captioning or visual question answering.   | |||
| HLT's work in speech as a health biomarker covers several speech-affecting diseases, including respiratory (e.g., obstructive sleep apnea, COVID 19), mood disorders (e.g. depression, psychosis, bipolar disease) and neurodegenerative diseases (e.g. Parkinson's and Alzheimer’s). Besides strongly investing in collecting corpora in collaboration with health professionals, HLT's work also encompassed processing in-the-wild corpora (from YouTube, or from repositories of testimonials by patients and caregivers). This allowed the group to address the limitations that must be overcome in telemedicine, and motivated them to encompass other non-invasive modalities (facial images, visual speech). Together with research groups from the Universities of Bremen and Augsburg, the group also published pioneering papers on “Silent Paralinguistics”, involving speech and EMG signals.  | |||
| Our collaboration with clinical facilities started in the early 1990s. Highlights of that collaboration were the award winning Eugenio and Vithea public domain tools. Eugenio was the first application of the synthesizer, combining it with n-gram word prediction, for children with cerebral palsy. Vithea, in the early 2010s, was a virtual therapist platform for aphasia patients, based on robust word spotting. It could be remotely monitored by therapists and include personalized exercises. Versions were later developed for children with Autism Spectrum Disorder, for children with reading disabilities, and for monitoring cognitive functions in elderly subjects. The group has also been particularly active in Portuguese Sign Language translation, and has been involved in monitoring outbreak events for disease surveillance, and decision support systems for personalized medicine in intensive care units. | |||
| More recently, other awards were also received in this interdisciplinary area: Catarina Botelho received the Maria de Lourdes Pintasilgo Award for Young alumna, and was the winner of the “3-Minute Thesis” Universidade de Lisboa competition for her work as speech as a health biomarker. Diogo Nunes also won the best clinical research paper award from the National Association for the Study of Pain (APED) for his work on chronic pain. | |||
| Another cross-disciplinary area is privacy-preserving speech processing, which constitutes one of the highlights of the group's very strong cooperation with Carnegie Mellon University. This framework enables voice-based services to access, analyze, and interpret recordings of speech without revealing the biometric features that enable the identification of the speaker (or metadata such as age, gender, probable nationality, emotional state, health status, etc.). The team focuses on two paradigms: cryptographic processing, and privacy-oriented speech manipulation, either using adversarial methods or removing and manipulating private attributes from speaker embeddings. They have also been very active in raising public awareness for the privacy issues posed by speech recordings processed by remote servers, effectively treating speech as PII (Personal Identifiable Information). | |||
| The group has also developed activities related to geographical text analysis, in some cases involving cross-disciplinary collaborations with researchers in fields such as the digital humanities, the computational social sciences, or the geographical information sciences. Specific topics include the extraction of geographical information from textual sources, or information retrieval considering geographical information needs. More recently, these activities have also expanded to the use of multimodal information as input, e.g. by considering tasks such as question answering over remote sensing imagery. | |||
| The  | |||
| Specifically on what regards multimodality, and noting that most previous vision and language research focuses on the English language, the group has pursued the development of methods capable of working across languages and cultures. Research within this topic has also tackled issues related to computational efficiency, aiming at smaller models that retain the effectiveness of much larger counterparts, e.g. through the use of retrieval augmentation as an approach to offload some of the training budget (and connecting also to some of the group’s activities in the area of information retrieval). The group has strong international collaborations in connection to this cross disciplinary area of activity, including the co-supervision of PhD students on topics such as image captioning, vision-and-language models for remote sensing data, or image captioning evaluation. | |||
| Our strong interdisciplinary profile is also patent in the very active collaboration in the fields of law (namely in collaboration with the Portuguese Supreme Justice Court), and psychology, with an emphasis on cyberbullying and hate speech. The collaboration with the Portuguese Supreme Justice Court focused on the development of tools that increase the open availability and accessibility of their rulings, through the use of natural language processing. Namely, tools for anonymization, information retrieval, optical character recognition, and segmentation and summarization have been developed, with the anonymization tool being currently used nationally by all courts in Portugal. The work on cyberbullying and hate speech delves into several perspectives, specifically, their characterization, including the diachronic aspect (namely, how it was affected by the Covid-19 pandemic), automatic detection, and prevention strategies. | |||
| The group also collaborates with the Portuguese Ministry of Education in a EU project (iRead4Skills) which aims to support the training of adults with low literacy levels, focusing on the development of intelligent systems for text complexity assessment in different languages. These systems can be used by trainers to create or adapt texts to the appropriate level of complexity for their individual students.   | |||
| Besides the involvement in many national and EU projects (including leading a COST Action, Multi3Generation - Multitask, Multilingual, Multimodal Language Generation), the group is heavily involved in two recent nationwide efforts on AI: Accelerat.ai and Center for Responsible AI (CRAI). These two large scale projects, with a total amount of funding for HLT of ~2.7 millions€ for 2023-2025, respectively aim at changing the landscape of conversational AI technologies and establishing Portugal as an international reference in the use of responsible AI through its four pillars: fairness, privacy, explainability and sustainability.   | |||
| The group was heavily involved in a technology transfer project with 9 companies in 2006-2008. One of the main achievements was the deployment of an online fully automatic broadcast news subtitling system for Portuguese TV, live since March 2008. This laid the groundwork for launching a spin-off company - VoiceInteraction - which now has offices in Brazil, New York and Singapore, producing subtitles for over 180 TV stations.   | |||
| HLT also has a very close cooperation with Unbabel, a translation company combining AI + post-editing, launched by a former PhD student of the group in 2013. The bilateral protocol between Unbabel and INESC-ID was established in 2015, with Technology Transfer purposes, and initially targeted scalable Linguistic Quality Assurance processes for crowdsourcing (over 50k editors, evaluators, annotators and in-house teams). Nowadays, the collaboration focuses on Product Innovation based on Machine Translation, (Automatic) Post-editing and Generative AI for several tasks (e.g., localisation, automatic post-editing, cultural transcreation) and domains, namely high risk domains (life sciences), customer support and websites, amongst others. It also extends to many other areas such as evaluation of neural machine translation, which is the topic of the PhD thesis of Ricardo Rei, winner of INESC-ID’s PhD student award.   | |||
| The group is also very involved in training the next generation of skilled industry players through  Technology Transfer within Master’s and Doctoral theses with industry (Baidu, Defined.ai, ELSA, Unbabel, VoiceInteraction). Another highlight in terms of training is the groups’ participation in an European training network which joins many top researchers in speech technologies for health applications (TAPAS).  The group has also strongly  invested in training the future leaders in AI, particularly through the Lisbon Machine Learning Summer School which has been running since 2011 in cooperation with other research institutes and companies. The list of speakers in this school includes a very impressive number of the very best machine learning experts in the field, and the number of applications received each year exceeds 500. | |||
| Last but not least, the international recognition of the group is also seen by many leadership roles played by HLT members in scientific associations and major conferences in our field: International Speech Communication Association (ISCA - former president, Chair of Interspeech 2005), International / European Machine Translation Association (IAMT/EAMT - current president, local organization of EAMT 2020), Institute of Electric and Electronic Engineers (IEEE -  Chair of Fellow Committee, Technical Chair of ICASSP 2025), and Association for Computational Linguistics (local organization of EMNLP 2015).   | |||
| Another highlight in terms of conference organization was the launching in 1993 of the first conference of the PROPOR (Computational Processing of the Portuguese Language) series, with the Portuguese and Brazilian research communities. This biennial conference, now in its 16th edition, has since then alternated between the two countries. The group also co-founded in 2005 the ISCA Iberian Languages Special Interest Group, and organized IberSpeech 2016, the first conference of this series to be held outside Spain. We also chaired IPMU (International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems) in 2020, remotely, and 2024, in person. | |||
| <!-- | |||
| The expertise in these core technologies is integrated into 3 main research strands: semantic processing of multimedia contents, spoken/multimodal dialogue systems and speech-to-speech translation. The original focus on European Portuguese has now been broadened to encompass all varieties of Portuguese. Two application areas also deserve our special attention: e-learning and e-inclusion, namely in the development of alternative and augmentative communication tools for people with special needs. | |||
| Some important landmarks are the development of the first Text-to-Speech synthesizer build from scratch for European Portuguese (DIXI) in 1991, and of the first version of our Large Vocabulary Continuous Speech Recognition system (AUDIMUS) for our language in 1997. | |||
| L²F is an institutional member of [http://www.isca-speech.org/ ISCA] (International Speech Communication Association). Its work has been internationally recognized through close cooperation with other research centers in Europe and in the United States, as well as through the recent organization of the conference [http://www.interspeech2005.org INTERSPEECH'2005]. | |||
| L²F has actively cooperated with the national industry: Vodafone, Portugal Telecom, Microsoft Portugal, Rádio-Televisão Portuguesa (RTP), Porto Editora, Texto Editora. | |||
| --> | --> | ||
Latest revision as of 18:25, 15 February 2024

INESC-ID is an R&D institute dedicated to advanced research and development in the fields of Information Technologies, Electronics, Communications, and Energy. It is a non-profit institution, privately owned by IST and INESC, officially declared of public interest. It was created in 2000, as a result of a reorganization of INESC, an institution which also played a pioneering role at the national level both in research and in the creation of SMEs. In 2004, INESC-ID was recognized by the Portuguese Government as an "Associated Laboratory".
INESC-ID incorporates a body of highly qualified researchers, including more than 70 researchers holding a PhD and numerous post-graduate students. The majority of the PhD researchers are university professors. This body of researchers enables INESC-ID to act in the different phases of the R&D process. The intense activity developed by INESC-ID resulted in a large number of scientific papers published in specialized journals and international conferences, industrial prototypes, and computer systems based on state-of-art technologies, as well as a number of patents and awards.
INESC-ID operates in the following scientific areas: AI for People & Society, Automated Reasoning & Software Reliability, Communication Networks, Distributed, Parallel & Secure Systems, Green Energy & Smart Converters, Graphics & Interaction, Human Language Technology, High Performance Computing, Architectures & Systems, Information & Decision Support Systems, Nano-Electronic Circuits & Systems, and Sustainable Power Systems.
The Human Language Technology (HLT) group is the successor of the Spoken Language Systems Laboratory (L²F), which was created in 2001. L²F brought together several groups of INESC and also independent researchers, aiming to advance the field of computational processing of spoken language, particularly for European Portuguese. The long term goal of L²F was to bridge the gap between natural spoken language and the underlying semantic information. The HLT group has expanded on the mission from L²F, considering a broader set of activities related to spoken and written language technologies, and building bridges with other disciplines.
The HLT group currently includes researchers/faculty from various universities, as well as over 30 doctoral students and post-graduate researchers. The group is highly interdisciplinary, featuring researchers whose background ranges from Electrical Engineering to Computer Science and Linguistics. The group members are actively involved in research and development activities related to many areas of spoken and written language processing, including speech coding, speech recognition and synthesis, multimodal dialogue systems and natural language interfaces, text and audio indexing and retrieval, language and dialect identification, text simplification and paraphrasing, automatic summarization, question answering, named entity recognition and information extraction, or machine translation from text and speech, among other topics and in no particular order. The group’s involvement in these core technologies can be seen not only by its publications and open domain toolkits (e.g. STRING), but also by its successful participation in joint evaluation campaigns such as in the recent challenges on machine translation quality estimation, or multilingual automatic evaluation metrics for open-domain dialogue systems.
Combining speech and language technologies with other disciplines has been one of the highlights of the group's work, particularly on the use of speech as a health biomarker, privacy-preserving speech processing, geographical text analysis and retrieval, music information retrieval, and also multimodal work on vision and language problems such as image captioning or visual question answering.
HLT's work in speech as a health biomarker covers several speech-affecting diseases, including respiratory (e.g., obstructive sleep apnea, COVID 19), mood disorders (e.g. depression, psychosis, bipolar disease) and neurodegenerative diseases (e.g. Parkinson's and Alzheimer’s). Besides strongly investing in collecting corpora in collaboration with health professionals, HLT's work also encompassed processing in-the-wild corpora (from YouTube, or from repositories of testimonials by patients and caregivers). This allowed the group to address the limitations that must be overcome in telemedicine, and motivated them to encompass other non-invasive modalities (facial images, visual speech). Together with research groups from the Universities of Bremen and Augsburg, the group also published pioneering papers on “Silent Paralinguistics”, involving speech and EMG signals.
Our collaboration with clinical facilities started in the early 1990s. Highlights of that collaboration were the award winning Eugenio and Vithea public domain tools. Eugenio was the first application of the synthesizer, combining it with n-gram word prediction, for children with cerebral palsy. Vithea, in the early 2010s, was a virtual therapist platform for aphasia patients, based on robust word spotting. It could be remotely monitored by therapists and include personalized exercises. Versions were later developed for children with Autism Spectrum Disorder, for children with reading disabilities, and for monitoring cognitive functions in elderly subjects. The group has also been particularly active in Portuguese Sign Language translation, and has been involved in monitoring outbreak events for disease surveillance, and decision support systems for personalized medicine in intensive care units.
More recently, other awards were also received in this interdisciplinary area: Catarina Botelho received the Maria de Lourdes Pintasilgo Award for Young alumna, and was the winner of the “3-Minute Thesis” Universidade de Lisboa competition for her work as speech as a health biomarker. Diogo Nunes also won the best clinical research paper award from the National Association for the Study of Pain (APED) for his work on chronic pain.
Another cross-disciplinary area is privacy-preserving speech processing, which constitutes one of the highlights of the group's very strong cooperation with Carnegie Mellon University. This framework enables voice-based services to access, analyze, and interpret recordings of speech without revealing the biometric features that enable the identification of the speaker (or metadata such as age, gender, probable nationality, emotional state, health status, etc.). The team focuses on two paradigms: cryptographic processing, and privacy-oriented speech manipulation, either using adversarial methods or removing and manipulating private attributes from speaker embeddings. They have also been very active in raising public awareness for the privacy issues posed by speech recordings processed by remote servers, effectively treating speech as PII (Personal Identifiable Information).
The group has also developed activities related to geographical text analysis, in some cases involving cross-disciplinary collaborations with researchers in fields such as the digital humanities, the computational social sciences, or the geographical information sciences. Specific topics include the extraction of geographical information from textual sources, or information retrieval considering geographical information needs. More recently, these activities have also expanded to the use of multimodal information as input, e.g. by considering tasks such as question answering over remote sensing imagery.
Specifically on what regards multimodality, and noting that most previous vision and language research focuses on the English language, the group has pursued the development of methods capable of working across languages and cultures. Research within this topic has also tackled issues related to computational efficiency, aiming at smaller models that retain the effectiveness of much larger counterparts, e.g. through the use of retrieval augmentation as an approach to offload some of the training budget (and connecting also to some of the group’s activities in the area of information retrieval). The group has strong international collaborations in connection to this cross disciplinary area of activity, including the co-supervision of PhD students on topics such as image captioning, vision-and-language models for remote sensing data, or image captioning evaluation.
Our strong interdisciplinary profile is also patent in the very active collaboration in the fields of law (namely in collaboration with the Portuguese Supreme Justice Court), and psychology, with an emphasis on cyberbullying and hate speech. The collaboration with the Portuguese Supreme Justice Court focused on the development of tools that increase the open availability and accessibility of their rulings, through the use of natural language processing. Namely, tools for anonymization, information retrieval, optical character recognition, and segmentation and summarization have been developed, with the anonymization tool being currently used nationally by all courts in Portugal. The work on cyberbullying and hate speech delves into several perspectives, specifically, their characterization, including the diachronic aspect (namely, how it was affected by the Covid-19 pandemic), automatic detection, and prevention strategies.
The group also collaborates with the Portuguese Ministry of Education in a EU project (iRead4Skills) which aims to support the training of adults with low literacy levels, focusing on the development of intelligent systems for text complexity assessment in different languages. These systems can be used by trainers to create or adapt texts to the appropriate level of complexity for their individual students.
Besides the involvement in many national and EU projects (including leading a COST Action, Multi3Generation - Multitask, Multilingual, Multimodal Language Generation), the group is heavily involved in two recent nationwide efforts on AI: Accelerat.ai and Center for Responsible AI (CRAI). These two large scale projects, with a total amount of funding for HLT of ~2.7 millions€ for 2023-2025, respectively aim at changing the landscape of conversational AI technologies and establishing Portugal as an international reference in the use of responsible AI through its four pillars: fairness, privacy, explainability and sustainability.
The group was heavily involved in a technology transfer project with 9 companies in 2006-2008. One of the main achievements was the deployment of an online fully automatic broadcast news subtitling system for Portuguese TV, live since March 2008. This laid the groundwork for launching a spin-off company - VoiceInteraction - which now has offices in Brazil, New York and Singapore, producing subtitles for over 180 TV stations.
HLT also has a very close cooperation with Unbabel, a translation company combining AI + post-editing, launched by a former PhD student of the group in 2013. The bilateral protocol between Unbabel and INESC-ID was established in 2015, with Technology Transfer purposes, and initially targeted scalable Linguistic Quality Assurance processes for crowdsourcing (over 50k editors, evaluators, annotators and in-house teams). Nowadays, the collaboration focuses on Product Innovation based on Machine Translation, (Automatic) Post-editing and Generative AI for several tasks (e.g., localisation, automatic post-editing, cultural transcreation) and domains, namely high risk domains (life sciences), customer support and websites, amongst others. It also extends to many other areas such as evaluation of neural machine translation, which is the topic of the PhD thesis of Ricardo Rei, winner of INESC-ID’s PhD student award.
The group is also very involved in training the next generation of skilled industry players through Technology Transfer within Master’s and Doctoral theses with industry (Baidu, Defined.ai, ELSA, Unbabel, VoiceInteraction). Another highlight in terms of training is the groups’ participation in an European training network which joins many top researchers in speech technologies for health applications (TAPAS). The group has also strongly invested in training the future leaders in AI, particularly through the Lisbon Machine Learning Summer School which has been running since 2011 in cooperation with other research institutes and companies. The list of speakers in this school includes a very impressive number of the very best machine learning experts in the field, and the number of applications received each year exceeds 500.
Last but not least, the international recognition of the group is also seen by many leadership roles played by HLT members in scientific associations and major conferences in our field: International Speech Communication Association (ISCA - former president, Chair of Interspeech 2005), International / European Machine Translation Association (IAMT/EAMT - current president, local organization of EAMT 2020), Institute of Electric and Electronic Engineers (IEEE - Chair of Fellow Committee, Technical Chair of ICASSP 2025), and Association for Computational Linguistics (local organization of EMNLP 2015).
Another highlight in terms of conference organization was the launching in 1993 of the first conference of the PROPOR (Computational Processing of the Portuguese Language) series, with the Portuguese and Brazilian research communities. This biennial conference, now in its 16th edition, has since then alternated between the two countries. The group also co-founded in 2005 the ISCA Iberian Languages Special Interest Group, and organized IberSpeech 2016, the first conference of this series to be held outside Spain. We also chaired IPMU (International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems) in 2020, remotely, and 2024, in person.