NLP-triggered, ontology-based KB enrichment strategies: Difference between revisions

From HLT@INESC-ID

(Created page with "__NOTOC__ {{speakerLargeBio| |name=Nuno Silva |image=nuno_silva.jpg |email=nps@isep.ipp.pt |www=http://www.dei.isep.ipp.pt/~nsilva |bio=Nuno Silva é professor de informática no...")
 
No edit summary
 
Line 18: Line 18:
== Abstract ==
== Abstract ==


Publically available text-based documents (e.g. news, meeting transcripts) are a very important source of knowledge, especially for organizations. These documents refer domain entities such as persons, places and professional positions, decisions and actions.
Publicly available text-based documents (e.g. news, meeting transcripts) are a very important source of knowledge, especially for organizations. These documents refer domain entities such as persons, places and professional positions, decisions and actions.


Querying these documents (instead of browsing, searching and finding) is a very relevant task for any person in general, and particularly for professionals dealing with intensive knowledge tasks. Querying text-based documents’ data, however, is not supported by common technology. For that, such documents’ content has to be explicitly and formally captured into KB facts. Making use of automatic NLP processes is a common approach, but their relatively low precision and recall give rise to data quality problems. Further, facts existing in the documents are often insufficient to answer complex queries, thus the need to enrich the captured facts with facts from third-part repositories (e.g. public Linked Open Data (LOD) repositories).
Querying these documents (instead of browsing, searching and finding) is a very relevant task for any person in general, and particularly for professionals dealing with intensive knowledge tasks. Querying text-based documents’ data, however, is not supported by common technology. For that, such documents’ content has to be explicitly and formally captured into KB facts. Making use of automatic NLP processes is a common approach, but their relatively low precision and recall give rise to data quality problems. Further, facts existing in the documents are often insufficient to answer complex queries, thus the need to enrich the captured facts with facts from third-part repositories (e.g. public Linked Open Data (LOD) repositories).

Latest revision as of 14:34, 25 February 2013

Nuno Silva
Nuno Silva
Nuno Silva
Nuno Silva é professor de informática no ISEP e investigador no GECAD-ISEP. Os seus interesses de I&D estão relacionados com integração de informação em ambientes heterogéneos, web semântica e engenharia de ontologias.
Addresses: www mail

Date

  • 15:00, Friday, March 1st, 2013
  • Room 020, INESC-ID

Speaker

  • Nuno Silva, ISEP – IPP

Abstract

Publicly available text-based documents (e.g. news, meeting transcripts) are a very important source of knowledge, especially for organizations. These documents refer domain entities such as persons, places and professional positions, decisions and actions.

Querying these documents (instead of browsing, searching and finding) is a very relevant task for any person in general, and particularly for professionals dealing with intensive knowledge tasks. Querying text-based documents’ data, however, is not supported by common technology. For that, such documents’ content has to be explicitly and formally captured into KB facts. Making use of automatic NLP processes is a common approach, but their relatively low precision and recall give rise to data quality problems. Further, facts existing in the documents are often insufficient to answer complex queries, thus the need to enrich the captured facts with facts from third-part repositories (e.g. public Linked Open Data (LOD) repositories).

While this description suggests an integration problem, addressing this issue includes more than that, namely duplicate detection, object mapping, consistency checking, consistency resolution and semantic and controlled data enrichment.

This talk will describe a process for enriching the repository from LOD repositories. This process is triggered by the NLP parsing process and conducted by the constraints of the knowledge base’s underlying semantically rich ontology. The ontological constraints adopted by the ontology are interpreted and adopted as configuration data for the enrichment strategies. Strategies are responsible for actually enriching the knowledge base (i.e. add new instances and new properties for the instances) according to the interpretation of the constraints.