PT-STAR: speech translation

From HLT@INESC-ID


Date

  • 15:00, Monday, February 08th, 2010
  • Room 336

Speaker

  • Nuno Grazina, L2F

Abstract

Global communication and understanding play an increasingly vital role in shaping our economic and social environment. However, language differences are a great barrier in achieving true global understanding and knowledge sharing. Human translators are often unavailable or are prohibitively expensive, and cannot deliver much needed information in a timely and usable manner. Thus, if a system capable of automatically performing translations with the same accuracy levels as a human being was to be created, it would be greatly beneficial in allowing true cross-lingual communication. The field of Spoken Language Translation aims at developing such systems, but despite seeing large improvements in the last few years, it still fails to achieve its goals for the majority of languages and real scenarios. This work’s ob jective is to build automatic translation systems for unlimited domains such as Broadcast News, lectures and presentations and for the specific case of the Portuguese-English language pair. Since the most popular Machine Translation paradigm, and the one that better suits unlimited domains, is based on statistical techniques, high quality language resources are needed in order to enable the systems to produce accurate translations. When considering the Portuguese-English language pair, such resources are very scarce and difficult to obtain. Several approaches have also been proposed to improve translation results which will be discussed and applied in this work. This document defines the context in which this work is developed, presents its ob jectives and related work and describes how the objectives have been met so far and how they will be met in the future. It contains an historic overview of the Spoken Language Translation field, focused primarily on the area of Machine Translation, which presents the evolution of the main technologies and resources involved in the process of translating spoken language, followed by a review of several state-of-the art techniques aimed at improving translation accuracy. This work has followed and will follow some of these studied approaches and technologies.