Language Dynamics and Capitalization using Maximum Entropy

Fernando Batista

Date

15:00, June 6, 2008
Room 336

Speaker

Fernando Batista

Abstract

This paper studies the impact of written language variations and the way it affects the capitalization task over time. A discriminative approach, based on maximum entropy models, is proposed to perform capitalization, taking the language changes into consideration. The proposed method makes it possible to use large corpora for training. The evaluation is performed over newspaper corpora using different testing periods. The achieved results reveal a strong relation between the capitalization performance and the elapsed time between the training and testing data periods.

Language Dynamics and Capitalization using Maximum Entropy

From HLT@INESC-ID

Revision as of 15:41, 30 May 2008 by Joana (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Date

Speaker

Abstract

Language Dynamics and Capitalization using Maximum Entropy

From HLT@INESC-ID

Revision as of 15:41, 30 May 2008 by Joana (talk | contribs)(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Date

Speaker

Abstract

Revision as of 15:41, 30 May 2008 by Joana (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)