500N-KPCrowd: Difference between revisions
From HLT@INESC-ID
(Created page with "500M-KPCrowd corpus is made of 500 News articles (50 stories for each of the 10 categories) manually annotated with Key Phrases by 20 Amazon Mechanical Turk workers. The news a...") |
No edit summary |
||
Line 1: | Line 1: | ||
500M-KPCrowd corpus is made of 500 News articles (50 stories for each of the 10 categories) manually annotated with Key Phrases by 20 Amazon Mechanical Turk workers. | 500M-KPCrowd corpus is made of 500 News articles (50 stories for each of the 10 categories selected) manually annotated with Key Phrases by 20 Amazon's Mechanical Turk workers. | ||
The news articles were retrieved from the | The news articles were retrieved from the online news sources. | ||
== Statistics == | == Statistics == |
Revision as of 18:32, 10 September 2013
500M-KPCrowd corpus is made of 500 News articles (50 stories for each of the 10 categories selected) manually annotated with Key Phrases by 20 Amazon's Mechanical Turk workers.
The news articles were retrieved from the online news sources.
Statistics
- Number of stories: 450 / 50 (Train / Test)
- Average number of Amazon Mechanical Turk workers per news: 20
- Number of Topics: 10
- Average Number of Key Phrases per news story: 40
Further Reading
The corpus is free for non-commercial use.
Please contact Luis Marujo for other uses.
Please cite this paper if you write any paper using the data below:
Luis Marujo and Anatole Gershman and Jaime Carbonell and Robert Frederking and João Paulo da Silva Neto, Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization, 8th International Conference on Language Resources and Evaluation (LREC 2012), May. 2012 , ELRA. pdf bibTeX