500N-KPCrowd: Difference between revisions

From HLT@INESC-ID

No edit summary
No edit summary
Line 1: Line 1:
500M-KPCrowd corpus is made of 500 News articles (50 stories for each of the 10 categories selected) manually annotated with Key Phrases by 20 Amazon's Mechanical Turk workers.  
'''500M-KPCrowd''' corpus is made of 500 News articles (50 stories for each of the 10 categories selected) manually annotated with Key Phrases by 20 Amazon's Mechanical Turk workers.  


The news articles were retrieved from the online news sources.
The news articles were retrieved from the online news sources.

Revision as of 18:32, 10 September 2013

500M-KPCrowd corpus is made of 500 News articles (50 stories for each of the 10 categories selected) manually annotated with Key Phrases by 20 Amazon's Mechanical Turk workers.

The news articles were retrieved from the online news sources.

Statistics

  • Number of stories: 450 / 50 (Train / Test)
  • Average number of Amazon Mechanical Turk workers per news: 20
  • Number of Topics: 10
  • Average Number of Key Phrases per news story: 40


Further Reading

The corpus is free for non-commercial use.

Please contact Luis Marujo for other uses.

Please cite this paper if you write any paper using the data below:

Luis Marujo and Anatole Gershman and Jaime Carbonell and Robert Frederking and João Paulo da Silva Neto, Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization, 8th International Conference on Language Resources and Evaluation (LREC 2012), May. 2012 , ELRA. pdf bibTeX


Download

500N-KPCrowd