Acbm at 13:56, 25 May 2010

2010-05-25T13:56:23Z

New page

__NOTOC__
{{infobox|name=João Graça
|username=javg
|contact=javg
|phone=+351-213-100-351
|fax=+351-213-145-843
}}

== Date ==

* 14:00, May 28th, 2010
* Room 4

== Speaker ==

* [[João Graça]]

== Abstract ==

We consider the problem of fully unsupervised learning of part-of-speech tags from unlabeled text, without assuming a word-tag dictionary. The standard Hidden Markov Model (HMM) fit via Expectation Maximization (EM) performs quite poorly, due in large part to the weakness of its inductive bias and excessive model capacity.

We address these problems by reducing its capacity via parametric and non-parametric constraints: eliminating parameters for rare words, adding morphological and orthographic features and enforcing word-tag association sparsity. We propose a simple model and an efficient learning algorithm, which are not much more complex than training using standard EM.

Our experiments on six languages (Bulgarian, Danish, English, Portuguese, Spanish, Turkish) achieve dramatic improvements over state-of-the-art results: 11% average absolute increase in aligned tagging accuracy.

[[category:Seminars]]
[[category:Seminars 2010]]

Controlling Complexity in Part-of-Speech Induction - Revision history

Acbm at 13:56, 25 May 2010