500N-KPCrowd

500M-KPCrowd is a corpus made of 500 news articles (50 stories for each of the 10 categories selected) manually annotated with Key Phrases by 20 Amazon's Mechanical Turk workers.

The news articles were retrieved from online news sources.

Statistics

Number of stories: 450 / 50 (Train / Test)
Average number of Amazon Mechanical Turk workers per news: 20
Number of Topics: 10
Average Number of Key Phrases per news story: 40

Luis Marujo and Anatole Gershman and Jaime Carbonell and Robert Frederking and João Paulo da Silva Neto, Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization, 8th International Conference on Language Resources and Evaluation (LREC 2012), May. 2012 , ELRA. pdf bibTeX

500N-KPCrowd

From HLT@INESC-ID

Statistics

Further Reading