|Addresses: www seas.upenn.edu mail|
A well-written text contains a mix of general statements and sentences that provide specific details. Yet no current work in computational linguistics has addressed the task of predicting the level of specificity of a sentence. In this talk I will present the development and evaluation of an automatic classifier capable of identifying general and specific sentences in news articles. We show that it is feasible to use existing annotations of discourse relations as training data and we validate the resulting classifier on sentences directly judged by multiple annotators. We also provide a task-based evaluation of our classifier on general and specific summaries written by people and demonstrate that the classifier predictions are able to distinguish between the two types of human authored summaries.
We also analyze the level of specific and general content in news documents and their human and automatic summaries. We discover that while human abstracts contain a more balanced mix of general and specific content, automatic summaries are overwhelmingly specific. We find that too much specificity adversely affects the quality of the summary.
The study of sentence specificity extends our prior work on text quality which I will briefly overview.
This is joint work with my student Annie Louis.
Note: This seminar will be held in English.