Speech-to-speech Translation: Difference between revisions

Latest revision as of 17:59, 17 July 2008

**People**
João Graça
Luísa Coheur
Diamantino Caseiro
Joana Paulo Pardal

Speech-to-speech machine translation is one of the most strategically relevant areas for L2F. The state of the art in speech translation is crucially dependent on the state of the art of several core technologies: speech recognition, machine translation and text-to-speech synthesis (namely in what concerns voice morphing, in order to reproduce the source speakers’ characteristics in the target speaker’s voice). The main limitations of current machine translation systems are the lack of semantic interpretation and world knowledge as well as insufficient coverage of the large proportion of idiosyncratic linguistic phenomena in lexicon and syntax. The most promising approaches combine improved statistical methods with the improved knowledge-driven methods in a variety of clever ways.

The research at L2F started by investing in statistically based speech-to-speech machine translation approaches based on weighted finite state transducers [Picó 2005] [Caseiro 2006], aiming at a tight integration between recognition and translation. WFSTs are especially well suited for combining different type of approaches, whether statistical or knowledge-based. The combination may be advantageous for achieving two different goals (i) include morpho-syntactic linguistic knowledge into the statistical machine translation paradigm and (ii) tackle the data sparseness problem for speech translation. The work was carried out within the scope of a national project on “Weighted Finite State Transducers Applied to Spoken Language Processing”.

In 2007, L2F participated in the 4th International Workshop on Spoken Language Translation [Graça 2007] where a standard combination of phrase based machine translation and translation reranking was used. During the reranking some new features using linguistic information were used, which showed promising results.

The current focus of research is now centered in text statistical machine translation, namely on word alignments, since these are an important starting point for most state of the art statistical machine translation systems. As so, a new algorithm that presents state of the art results was developed in cooperation with the University of Pennsylvania [Graça 2007, Ganchev 2008]. Also, a guideline for building manual alignments between different language pairs was proposed, along with the gold alignments for six different European languages pairs [Graça 2008]. This can be a valuable resource both for evaluating/tuning word alignment models.

In September 2008, the Machine Translation team is going to be augmented with four Master students.

Related Resources

Golden collection of parallel multi-language word alignments - Manually annotated word alignments between six european languages taken from the Europarl common test set

Related Software

Constrained Alignment Toolkit (CAT) - Word Alignment Toolkit produced in cooperation with the University of Pennsylvania. Please see official web site

Demos

A demonstration of tightly integrated speech-to-text translation is available. The translation module is implemented as a single WFST that is used as the language model in the speech recognizer. This architecture produces sentences in the target language directly from source language speech.

A demonstration of large vocabulary translation is also available. The output of the WFST-based speech recognition module was translated using a WFST-based machine translation module trained in the European Parliament domain.

Recent demos:

Broadcast News translation from Portuguese to Spanish and English

Broadcast News translation from South American Spanish to Portuguese

Finished Projects

WFST - Weighted Finite State Transducers Applied to Spoken Language Processing (2004-2007)

Selected Publications

Kuzman Ganchev, João de Almeida Varelas Graça, Ben Taskar, Better Alignments = Better Translations?, In ACL-08: HLT, Association for Computational Linguistics, pages 986-993, June 2008

João de Almeida Varelas Graça, Joana Paulo Pardal, Luísa Coheur, Diamantino António Caseiro, Building a golden collection of parallel Multi-Language Word Alignment, In The 6th International Conference on Language Resources and Evaluation, LREC 2008, May 2008

João de Almeida Varelas Graça, Kuzman Ganchev, Ben Taskar, Expectation Maximization and Posterior Constraints, In Neural Information Processing Systems Conference (NIPS), December 2007

João de Almeida Varelas Graça, Diamantino António Caseiro, Luísa Coheur, The INESC-ID IWSLT07 SMT System, In Proceedings of IWSLT International Workshop on Spoken Language Translation, pages 125-130, October 2007 (slides pdf)

Diamantino António Caseiro, Isabel Trancoso, Weighted Finite-State Transducer Inference for Limited-Domain Speech-to-Speech Translation, In Computational Processing of the Portuguese Language: 7th International Workshop, PROPOR 2006, Springer, pages 60 - 68, May 2006

D. Picó, J. González, F. Casacuberta, Diamantino António Caseiro, Isabel Trancoso, Finite-state transducer inference for a speech-input Portuguese-to-English machine translation system, In Interspeech 2005, September 2005

@@ Line 10: / Line 10: @@
 | align="center" |<div style="border-style: solid; border-width: 0px; width: 100px;">[[Image:tipo-passe-joana.png|100px|center|]][[Joana Paulo Pardal]]</div>
 |}
+__NOTOC__
-Speech-to-speech machine translation is one of the most strategically relevant areas for L2F.
+Speech-to-speech machine translation is one of the most strategically relevant areas for L2F. The state of the art in speech translation is crucially dependent on the state of the art of several core technologies: speech recognition, machine translation and text-to-speech synthesis (namely in what concerns voice morphing, in order to reproduce the source speakers’ characteristics in the target speaker’s voice). The main limitations of current machine translation systems are the lack of semantic interpretation and world knowledge as well as insufficient coverage of the large proportion of idiosyncratic linguistic phenomena in lexicon and syntax. The most promising approaches combine improved statistical methods with the improved knowledge-driven methods in a variety of clever ways.
-The state of the art in speech translation is crucially dependent on the state of the art of several core technologies: speech recognition, machine translation and, to a lesser extent, text-to-speech synthesis (namely in what concerns voice morphing, in order to reproduce the source speakers’ characteristics in the target speaker’s voice). The main limitations of current machine translation systems are the lack of semantic interpretation and world knowledge as well as insufficient coverage of the large proportion of idiosyncratic linguistic phenomena in lexicon and syntax. The most promising approaches combine improved statistical methods with the improved knowledge-driven methods in a variety of clever ways.
-L2F has been investing in statistically based speech-to-speech machine translation approaches based on weighted finite state transducers [Picó 2005] [Caseiro 2006], aiming at a tight integration between recognition and translation. WFSTs are especially well suited for combining different type of approaches, whether statistical or knowledge-based. The combination may be advantageous for achieving two different goals (i) include morpho-syntactic linguistic knowledge into the statistical machine translation paradigm and (ii) tackle the data sparseness problem for speech translation.
+The research at L2F started by investing in statistically based speech-to-speech machine translation approaches based on weighted finite state transducers [Picó 2005] [Caseiro 2006], aiming at a tight integration between recognition and translation. WFSTs are especially well suited for combining different type of approaches, whether statistical or knowledge-based. The combination may be advantageous for achieving two different goals (i) include morpho-syntactic linguistic knowledge into the statistical machine translation paradigm and (ii) tackle the data sparseness problem for speech translation. The work was carried out within the scope of a national project on “Weighted Finite State Transducers Applied to Spoken Language Processing”.
-This research is carried out within the scope of a national project on “Weighted Finite State Transducers Applied to Spoken Language Processing”. Two PhD theses have recently started in this area.
+In 2007, L2F participated in the 4th International Workshop on Spoken Language Translation [Graça 2007] where a standard combination of phrase based machine translation and translation reranking was used. During the reranking some new features using linguistic information were used, which showed promising results.
+The current focus of research is now centered in text statistical machine translation, namely on word alignments, since these are an important starting point for most state of the art statistical machine translation systems. As so, a new algorithm that presents state of the art results was developed in cooperation with the University of Pennsylvania [Graça 2007, Ganchev 2008]. Also, a guideline for building manual alignments between different language pairs was proposed, along with the gold alignments for six different European languages pairs [Graça 2008]. This can be a valuable resource both for evaluating/tuning word alignment models.
+In September 2008, the Machine Translation team is going to be augmented with four Master students.
 == Related Resources ==
@@ Line 23: / Line 26: @@
 == Related Software ==
+* Constrained Alignment Toolkit (CAT) - Word Alignment Toolkit produced in cooperation with the University of Pennsylvania. Please see [http://www.seas.upenn.edu/~strctlrn/CAT/CAT.html official web site]
+== Demos ==
+A demonstration of tightly integrated [http://www.l2f.inesc-id.pt/projects/wfst/demo/trans-pc.smi speech-to-text] translation is available.
+The translation module is implemented as a single WFST that is used as the language model in the speech recognizer. This architecture produces sentences in the target language directly from source language speech.
+A demonstration of [http://www.l2f.inesc-id.pt/projects/wfst/demo-bnews/2006_10_19-19_59_01_en.smi large vocabulary translation] is also available.
+The output of the WFST-based speech recognition module was translated using
+a WFST-based machine translation module trained in the European Parliament domain.
+Recent demos:
+[http://www.l2f.inesc-id.pt/projects/wfst/DEMO_pt_es_en/demo_pt_es_en.smi Broadcast News translation from Portuguese to Spanish and English]
+[http://www.l2f.inesc-id.pt/projects/wfst/ES_DEMO/chile/chile.smi Broadcast News translation from South American Spanish to Portuguese]
 == Finished Projects ==