Fairy tale corpus: Difference between revisions

From HLT@INESC-ID

No edit summary
 
No edit summary
Line 14: Line 14:
If you use the corpus, please cite the following article:
If you use the corpus, please cite the following article:
* Paula Vaz Lobo, David Martins de Matos, Fairy Tale Corpus Organization Using Latent Semantic Mapping and an Item-to-item Top-n Recommendation Algorithm, In Language Resources and Evaluation Conference - LREC 2010, European Language Resources Association (ELRA), Malta, May 2010
* Paula Vaz Lobo, David Martins de Matos, Fairy Tale Corpus Organization Using Latent Semantic Mapping and an Item-to-item Top-n Recommendation Algorithm, In Language Resources and Evaluation Conference - LREC 2010, European Language Resources Association (ELRA), Malta, May 2010
Download the corpus: [http://www.l2f.inesc-id.pt/resources/recommendation/fairy-tales/fairy-tales-corpus-map.tar.gz fairy-tales-corpus-map.tar.gz]

Revision as of 18:09, 22 January 2011

Fairy tale corpus semantically organized and tagged.

This fairy tale corpus is divided in semantically related clusters. Clusters overlap, i.e., each tale can be allocated to more than one cluster.

Fairy tales are written for children and its plot and language are simpler than tales written for adults. Fairy tales are also easily read and understood. Fairy tale sentences are shorter and emotions are well defined. A fairy tale corpus can be useful for emotion extraction, semantic role extraction, meaning extraction, recommendation, text classification, among others.

  • Number of stories: 453
  • Number of words: 908,174
  • Average words/story: 1891
  • Shortest story: 75
  • Longest story: 17,694
  • Clusters: 365

If you use the corpus, please cite the following article:

  • Paula Vaz Lobo, David Martins de Matos, Fairy Tale Corpus Organization Using Latent Semantic Mapping and an Item-to-item Top-n Recommendation Algorithm, In Language Resources and Evaluation Conference - LREC 2010, European Language Resources Association (ELRA), Malta, May 2010

Download the corpus: fairy-tales-corpus-map.tar.gz