<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.hlt.inesc-id.pt/wiki/index.php?action=history&amp;feed=atom&amp;title=Controlling_Complexity_in_Part-of-Speech_Induction</id>
	<title>Controlling Complexity in Part-of-Speech Induction - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://www.hlt.inesc-id.pt/wiki/index.php?action=history&amp;feed=atom&amp;title=Controlling_Complexity_in_Part-of-Speech_Induction"/>
	<link rel="alternate" type="text/html" href="https://www.hlt.inesc-id.pt/wiki/index.php?title=Controlling_Complexity_in_Part-of-Speech_Induction&amp;action=history"/>
	<updated>2026-06-07T19:25:40Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://www.hlt.inesc-id.pt/wiki/index.php?title=Controlling_Complexity_in_Part-of-Speech_Induction&amp;diff=5818&amp;oldid=prev</id>
		<title>Acbm at 13:56, 25 May 2010</title>
		<link rel="alternate" type="text/html" href="https://www.hlt.inesc-id.pt/wiki/index.php?title=Controlling_Complexity_in_Part-of-Speech_Induction&amp;diff=5818&amp;oldid=prev"/>
		<updated>2010-05-25T13:56:23Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;__NOTOC__&lt;br /&gt;
{{infobox|name=João Graça&lt;br /&gt;
|username=javg&lt;br /&gt;
|contact=javg&lt;br /&gt;
|phone=+351-213-100-351&lt;br /&gt;
|fax=+351-213-145-843&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Date ==&lt;br /&gt;
&lt;br /&gt;
* 14:00, May 28th, 2010&lt;br /&gt;
* Room 4&lt;br /&gt;
&lt;br /&gt;
== Speaker ==&lt;br /&gt;
&lt;br /&gt;
* [[João Graça]]&lt;br /&gt;
&lt;br /&gt;
== Abstract ==&lt;br /&gt;
&lt;br /&gt;
We consider the problem of fully unsupervised learning of part-of-speech tags from unlabeled text, without assuming a word-tag dictionary. The standard Hidden Markov Model (HMM) fit via Expectation Maximization (EM) performs quite poorly, due in large part to  the weakness of its inductive bias and excessive model capacity. &lt;br /&gt;
&lt;br /&gt;
We address these problems by reducing its capacity via parametric and non-parametric constraints: eliminating parameters for rare words, adding morphological and orthographic features and enforcing word-tag association sparsity. We propose a simple model and an efficient learning algorithm, which are not much more complex than training using standard EM.  &lt;br /&gt;
&lt;br /&gt;
Our experiments on six languages (Bulgarian, Danish, English, Portuguese, Spanish, Turkish) achieve dramatic improvements over state-of-the-art results: 11% average absolute increase in aligned tagging accuracy.&lt;br /&gt;
&lt;br /&gt;
[[category:Seminars]]&lt;br /&gt;
[[category:Seminars 2010]]&lt;/div&gt;</summary>
		<author><name>Acbm</name></author>
	</entry>
</feed>