The present invention lies in the field of data storage and the associated processing. Specifically, embodiments of the present invention relate to the performance of reconciliation processing resources in a graph representation of a dataset. The reconciliation processing is intended to reconcile heterogeneity between semantically corresponding resources in the graph.
The enormous volume of graph data available creates potential for automated or semi-automated analysis that can not only reveal statistic trends but also discover hidden patterns and distil knowledge out of data. Formal semantics plays a key role in automating computation-intensive tasks. While there is a longstanding battle over how semantics are best captured, it is widely regarded that graphs and graph-like representations are the best instrument to emulate how humans perceive the world (as an ontology with entities and relationships among entities).
Graph databases, therefore, offer the advantage of naturally presenting “semantic networks”-based knowledge representation that can store large amounts of structured and unstructured data.
A graph database is a data representation which employs nodes and edges (or arcs) to represent entities, and arcs between nodes to represent relationships between those entities. Graph databases are used in a wide variety of different applications that can be generally grouped into two major categories. The first consists of complex knowledge-based systems that have large collections of concept descriptions (referred to as “knowledge-based applications”), such as intelligent decision support and self learning. The second includes applications that involve performing graph analysis over transactional data (referred to as “transactional data applications”), such as social data and business intelligence.
At the heart of formalised graph databases is the Resource Description Framework, RDF, a simple graph-based data modelling language providing semantic mark-up of data. With RDF, data silos can begin to be pieced together and the current archipelagic data landscape transformed into a connected data graph upon which complicated data analytics and business intelligence applications can be built.
Data sets may be generally highly heterogeneous and distributed. The decentralised nature of such data leads to the issue that often many data sources use different references to indicate the same real world object. A necessary and important step towards utilising available graph data effectively is to identify and reconcile multiple references for semantic consistence. Hereinafter, the term “reconciliation” is used to indicate the process of reconciling heterogeneity between resources (as nodes in a graph of data, for example, as the subject or object of RDF triples) by identifying and defining equivalence links among resources that correspond semantically to each other. It follows that “reconciliation processing” is the execution of algorithms and instructions by a processor in order to achieve reconciliation.
The significance of data reconciliation is evident. Data reconciliation ensures data integrity when heterogeneous data sets are linked (resulting in semantic variety in data). Meaningful analysis cannot be performed otherwise. Meanwhile, equivalencies allow applications to align with each other. Communications among the applications can, therefore, be automated and delegated to computers.