NLP-triggered, ontology-based KB enrichment strategies: Difference between revisions

Latest revision as of 14:34, 25 February 2013

Nuno Silva

Nuno Silva é professor de informática no ISEP e investigador no GECAD-ISEP. Os seus interesses de I&D estão relacionados com integração de informação em ambientes heterogéneos, web semântica e engenharia de ontologias.

Addresses: www mail

Date

15:00, Friday, March 1^st, 2013
Room 020, INESC-ID

Speaker

Nuno Silva, ISEP – IPP

Abstract

Publicly available text-based documents (e.g. news, meeting transcripts) are a very important source of knowledge, especially for organizations. These documents refer domain entities such as persons, places and professional positions, decisions and actions.

Querying these documents (instead of browsing, searching and finding) is a very relevant task for any person in general, and particularly for professionals dealing with intensive knowledge tasks. Querying text-based documents’ data, however, is not supported by common technology. For that, such documents’ content has to be explicitly and formally captured into KB facts. Making use of automatic NLP processes is a common approach, but their relatively low precision and recall give rise to data quality problems. Further, facts existing in the documents are often insufficient to answer complex queries, thus the need to enrich the captured facts with facts from third-part repositories (e.g. public Linked Open Data (LOD) repositories).

While this description suggests an integration problem, addressing this issue includes more than that, namely duplicate detection, object mapping, consistency checking, consistency resolution and semantic and controlled data enrichment.

This talk will describe a process for enriching the repository from LOD repositories. This process is triggered by the NLP parsing process and conducted by the constraints of the knowledge base’s underlying semantically rich ontology. The ontological constraints adopted by the ontology are interpreted and adopted as configuration data for the enrichment strategies. Strategies are responsible for actually enriching the knowledge base (i.e. add new instances and new properties for the instances) according to the interpretation of the constraints.

@@ Line 18: / Line 18: @@
 == Abstract ==
-Publically available text-based documents (e.g. news, meeting transcripts) are a very important source of knowledge, especially for organizations. These documents refer domain entities such as persons, places and professional positions, decisions and actions.
+Publicly available text-based documents (e.g. news, meeting transcripts) are a very important source of knowledge, especially for organizations. These documents refer domain entities such as persons, places and professional positions, decisions and actions.
 Querying these documents (instead of browsing, searching and finding) is a very relevant task for any person in general, and particularly for professionals dealing with intensive knowledge tasks. Querying text-based documents’ data, however, is not supported by common technology. For that, such documents’ content has to be explicitly and formally captured into KB facts. Making use of automatic NLP processes is a common approach, but their relatively low precision and recall give rise to data quality problems. Further, facts existing in the documents are often insufficient to answer complex queries, thus the need to enrich the captured facts with facts from third-part repositories (e.g. public Linked Open Data (LOD) repositories).

NLP-triggered, ontology-based KB enrichment strategies: Difference between revisions

From HLT@INESC-ID

Latest revision as of 14:34, 25 February 2013

Date

Speaker

Abstract