110-PT-BN-KP: Difference between revisions
From HLT@INESC-ID
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
110-PT-BN-KP corpus is made of 110 Portuguese Broadcast News annotated with Key Phrases by an expert. The 110 Portuguese Broadcast News (BN) were extracted from 8 BN programs, containing from the European Portuguese | 110-PT-BN-KP corpus is made of 110 Portuguese Broadcast News annotated with Key Phrases by an expert. The 110 Portuguese Broadcast News (BN) were extracted from 8 BN programs, containing from the European Portuguese [[ALERT Corpus|ALERT]] | ||
== Statistics == | == Statistics == |
Revision as of 17:59, 10 September 2013
110-PT-BN-KP corpus is made of 110 Portuguese Broadcast News annotated with Key Phrases by an expert. The 110 Portuguese Broadcast News (BN) were extracted from 8 BN programs, containing from the European Portuguese ALERT
Statistics
Train / Test
- Number of stories: 100 / 10
- Number of words: 29,225 /3,896
- Average Number of Key Phrases: 24 / 29
Further Reading
The corpus is free for non-commercial use. Please contact Luis Marujo for other uses. Please cite this paper if you write any paper using the data below:
Luis Marujo, Márcio Viveiros, João Paulo da Silva Neto, Keyphrase Cloud Generation of Broadcast News, In proceeding of Interspeech 2011: 12th Annual Conference of the International Speech Communication Association, ISCA, Florence, Italy, August 2011 pdf bibtex