<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.hlt.inesc-id.pt/wiki/index.php?action=history&amp;feed=atom&amp;title=Speech_recognition_for_less-represented_languages</id>
	<title>Speech recognition for less-represented languages - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://www.hlt.inesc-id.pt/wiki/index.php?action=history&amp;feed=atom&amp;title=Speech_recognition_for_less-represented_languages"/>
	<link rel="alternate" type="text/html" href="https://www.hlt.inesc-id.pt/wiki/index.php?title=Speech_recognition_for_less-represented_languages&amp;action=history"/>
	<updated>2026-05-23T19:29:11Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://www.hlt.inesc-id.pt/wiki/index.php?title=Speech_recognition_for_less-represented_languages&amp;diff=4733&amp;oldid=prev</id>
		<title>Joana at 15:02, 16 June 2008</title>
		<link rel="alternate" type="text/html" href="https://www.hlt.inesc-id.pt/wiki/index.php?title=Speech_recognition_for_less-represented_languages&amp;diff=4733&amp;oldid=prev"/>
		<updated>2008-06-16T15:02:54Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 15:02, 16 June 2008&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l4&quot;&gt;Line 4:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 4:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|image=thomas.pellegrini.jpg&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|image=thomas.pellegrini.jpg&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|email=tuttitom@limsi.fr&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|email=tuttitom@limsi.fr&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|www=&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|www=&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;http://www.limsi.fr/Individu/tuttitom&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|bio=Thomas Pellegrini is currently a teaching assistant at Paris la Sorbonne.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|bio=Thomas Pellegrini is currently a teaching assistant at Paris la Sorbonne.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;He just received a PhD in Computer Science from the University of&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;He just received a PhD in Computer Science from the University of&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Joana</name></author>
	</entry>
	<entry>
		<id>https://www.hlt.inesc-id.pt/wiki/index.php?title=Speech_recognition_for_less-represented_languages&amp;diff=4729&amp;oldid=prev</id>
		<title>Joana at 13:41, 15 June 2008</title>
		<link rel="alternate" type="text/html" href="https://www.hlt.inesc-id.pt/wiki/index.php?title=Speech_recognition_for_less-represented_languages&amp;diff=4729&amp;oldid=prev"/>
		<updated>2008-06-15T13:41:13Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;__NOTOC__&lt;br /&gt;
{{speakerLargeBio|&lt;br /&gt;
|name=Thomas Pellegrini&lt;br /&gt;
|image=thomas.pellegrini.jpg&lt;br /&gt;
|email=tuttitom@limsi.fr&lt;br /&gt;
|www=&lt;br /&gt;
|bio=Thomas Pellegrini is currently a teaching assistant at Paris la Sorbonne.&lt;br /&gt;
He just received a PhD in Computer Science from the University of&lt;br /&gt;
Paris-Sud, in the Spoken Language Processing Group from [www.limsi.fr/TLP LIMSI-CNRS].}}&lt;br /&gt;
&lt;br /&gt;
== Date ==&lt;br /&gt;
* 14:00, Wednesday, June 18&amp;lt;sup&amp;gt;th&amp;lt;/sup&amp;gt;, 2008&lt;br /&gt;
* 3rd floor meeting room, INESC-ID&lt;br /&gt;
&lt;br /&gt;
== Speaker ==&lt;br /&gt;
&lt;br /&gt;
* Thomas Pellegrini, LIMSI-CNRS&lt;br /&gt;
&lt;br /&gt;
== Abstract ==&lt;br /&gt;
&lt;br /&gt;
The last decade has seen growing interest in developing speech and language technologies for a wider range of languages. State-of-the-Art speech recognizers are typically trained on huge amounts of data, both transcribed speech and texts. My thesis work focused on speech recognition for languages for which small amounts of data are available: the &amp;quot;less-represented languages&amp;quot;. These languages often suffer from poor representation on the Web, which is the main collecting source. Very high out-of-vocabulary rates and poor language model estimation are common for these languages. &lt;br /&gt;
In this presentation, I will briefly describe the difficulties posed by building new ASR systems with little data. Then I will present our attempt to improve performance, by using sub-word units in the recognition lexicon. We enhanced a data-driven word decompounding algorithm in order to address the problem of increased phonetic confusability arising from word decompounding. Experiments carried out on two distinct languages, Amharic and Turkish, achieved small but significative improvements, around 5% relative in word error rate, with 30% to 50% relative OOV reductions. The algorithm is relatively language independent and requires minimal adaptation to be applied to other languages.&lt;br /&gt;
&lt;br /&gt;
[[category:Seminars]]&lt;br /&gt;
[[category:Seminars 2008]]&lt;br /&gt;
[[category:Invited Presentations]]&lt;/div&gt;</summary>
		<author><name>Joana</name></author>
	</entry>
</feed>