|Addresses: www mail|
Publicly available text-based documents (e.g. news, meeting transcripts) are a very important source of knowledge, especially for organizations. These documents refer domain entities such as persons, places and professional positions, decisions and actions.
Querying these documents (instead of browsing, searching and finding) is a very relevant task for any person in general, and particularly for professionals dealing with intensive knowledge tasks. Querying text-based documents’ data, however, is not supported by common technology. For that, such documents’ content has to be explicitly and formally captured into KB facts. Making use of automatic NLP processes is a common approach, but their relatively low precision and recall give rise to data quality problems. Further, facts existing in the documents are often insufficient to answer complex queries, thus the need to enrich the captured facts with facts from third-part repositories (e.g. public Linked Open Data (LOD) repositories).
While this description suggests an integration problem, addressing this issue includes more than that, namely duplicate detection, object mapping, consistency checking, consistency resolution and semantic and controlled data enrichment.
This talk will describe a process for enriching the repository from LOD repositories. This process is triggered by the NLP parsing process and conducted by the constraints of the knowledge base’s underlying semantically rich ontology. The ontological constraints adopted by the ontology are interpreted and adopted as configuration data for the enrichment strategies. Strategies are responsible for actually enriching the knowledge base (i.e. add new instances and new properties for the instances) according to the interpretation of the constraints.