110-PT-BN-KP: Difference between revisions

From HLT@INESC-ID

No edit summary
No edit summary
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
110-PT-BN-KP corpus is made of 110 Portuguese Broadcast News annotated with Key Phrases by an expert. The 110 Portuguese Broadcast News (BN) were extracted from 8 BN programs, containing from the European Portuguese [[ALERT Corpus|ALERT]]
110-PT-BN-KP corpus is made of 110 Portuguese Broadcast News annotated with Key Phrases by an expert.  
 
The Broadcast News (BN) were extracted from 8 BN programs from the European Portuguese '''[[ALERT Corpus|ALERT]]'''


== Statistics ==
== Statistics ==
Line 8: Line 10:


== Further Reading ==
== Further Reading ==
The corpus is free for non-commercial use. Please contact Luis Marujo for other uses.
The corpus is free for non-commercial use.  
 
Please contact '''Luis Marujo''' for other uses.
 
Please cite this paper if you write any paper using the data below:
Please cite this paper if you write any paper using the data below:


Luis Marujo, Márcio Viveiros, João Paulo da Silva Neto, Keyphrase Cloud Generation of Broadcast News, In proceeding of Interspeech 2011: 12th Annual Conference of the International Speech Communication Association, ISCA, Florence, Italy, August 2011 [http://www.inesc-id.pt/pt/indicadores/Ficheiros/7588.pdf pdf] [http://www.inesc-id.pt/intranet/publicacoes/bibtex.php?bibtex=7588 bibtex]
Luis Marujo, Márcio Viveiros, João Paulo da Silva Neto, '''Keyphrase Cloud Generation of Broadcast News''', In proceeding of Interspeech 2011: 12th Annual Conference of the International Speech Communication Association, ISCA, Florence, Italy, August 2011 [http://www.inesc-id.pt/pt/indicadores/Ficheiros/7588.pdf pdf] [http://www.inesc-id.pt/intranet/publicacoes/bibtex.php?bibtex=7588 bibtex]


== Download ==
== Download ==
[http://www.l2f.inesc-id.pt/~ldsm/110-PT-BN-KP.zip 110-PT-BN-KP]
[http://www.l2f.inesc-id.pt/~ldsm/110-PT-BN-KP.zip 110-PT-BN-KP]

Latest revision as of 18:08, 10 September 2013

110-PT-BN-KP corpus is made of 110 Portuguese Broadcast News annotated with Key Phrases by an expert.

The Broadcast News (BN) were extracted from 8 BN programs from the European Portuguese ALERT

Statistics

Train / Test

  • Number of stories: 100 / 10
  • Number of words: 29,225 /3,896
  • Average Number of Key Phrases: 24 / 29

Further Reading

The corpus is free for non-commercial use.

Please contact Luis Marujo for other uses.

Please cite this paper if you write any paper using the data below:

Luis Marujo, Márcio Viveiros, João Paulo da Silva Neto, Keyphrase Cloud Generation of Broadcast News, In proceeding of Interspeech 2011: 12th Annual Conference of the International Speech Communication Association, ISCA, Florence, Italy, August 2011 pdf bibtex

Download

110-PT-BN-KP