Fairy tale corpus: Difference between revisions

From HLT@INESC-ID

No edit summary
No edit summary
(One intermediate revision by the same user not shown)
Line 1: Line 1:
Fairy tale corpus semantically organized and tagged.
Fairy tale corpus semantically organized and tagged.
== About the Corpus ==


This fairy tale corpus is  divided in semantically related clusters. Clusters overlap, i.e., each tale can be allocated to more than one cluster.
This fairy tale corpus is  divided in semantically related clusters. Clusters overlap, i.e., each tale can be allocated to more than one cluster.
Line 11: Line 13:
* Longest story: 17,694
* Longest story: 17,694
* Clusters: 365
* Clusters: 365
== Using the Corpus ==
The corpus is free for non-commercial use. Please contact [[Paula Cristina Vaz]] for other uses.


If you use the corpus, please cite the following article:
If you use the corpus, please cite the following article:
* Paula Vaz Lobo, David Martins de Matos, Fairy Tale Corpus Organization Using Latent Semantic Mapping and an Item-to-item Top-n Recommendation Algorithm, In Language Resources and Evaluation Conference - LREC 2010, European Language Resources Association (ELRA), Malta, May 2010
* Paula Vaz Lobo, David Martins de Matos, '''Fairy Tale Corpus Organization Using Latent Semantic Mapping and an Item-to-item Top-n Recommendation Algorithm''', In Language Resources and Evaluation Conference - LREC 2010, European Language Resources Association (ELRA), Malta, May 2010
 
== Downloads ==


Download the corpus: [http://www.l2f.inesc-id.pt/resources/recommendation/fairy-tales/fairy-tales-corpus-map.tar.gz fairy-tales-corpus-map.tar.gz]
Download the corpus: [http://www.l2f.inesc-id.pt/resources/recommendation/fairy-tales/fairy-tales-corpus-map.tar.gz fairy-tales-corpus-map.tar.gz]
[[category:Corpora Resources]]
[[category:Text Resources]]
[[category:Resources]]

Revision as of 18:12, 22 January 2011

Fairy tale corpus semantically organized and tagged.

About the Corpus

This fairy tale corpus is divided in semantically related clusters. Clusters overlap, i.e., each tale can be allocated to more than one cluster.

Fairy tales are written for children and its plot and language are simpler than tales written for adults. Fairy tales are also easily read and understood. Fairy tale sentences are shorter and emotions are well defined. A fairy tale corpus can be useful for emotion extraction, semantic role extraction, meaning extraction, recommendation, text classification, among others.

  • Number of stories: 453
  • Number of words: 908,174
  • Average words/story: 1891
  • Shortest story: 75
  • Longest story: 17,694
  • Clusters: 365

Using the Corpus

The corpus is free for non-commercial use. Please contact Paula Cristina Vaz for other uses.

If you use the corpus, please cite the following article:

  • Paula Vaz Lobo, David Martins de Matos, Fairy Tale Corpus Organization Using Latent Semantic Mapping and an Item-to-item Top-n Recommendation Algorithm, In Language Resources and Evaluation Conference - LREC 2010, European Language Resources Association (ELRA), Malta, May 2010

Downloads

Download the corpus: fairy-tales-corpus-map.tar.gz