A knowledge graph is a way of representing information about objects that captures important relationships between those objects. Knowledge graphs are used in digital information retrieval and organization systems to store and organize potentially vast amounts of information, such as found on the Internet. To combine knowledge from multiple, heterogeneous sources into a unified, mineable knowledge graph, it is important to provide effective techniques for digital entity matching. Entity matching is the task of determining if two entities in a data set refer to the same real-world object.
The design of computers for performing automated entity matching is challenging, as two digital entities referring to the same object may nevertheless contain different attribute sets, e.g., due to differences in attribute selection, formatting inconsistencies, inaccuracies, etc., across different knowledge graphs. Furthermore, efficient computational techniques are needed to process the sheer volume of digital entities contained in large-scale knowledge graphs associated with different knowledge domains (covering, e.g., history, science, entertainment, etc.), such as found on the Internet.
Existing entity matching techniques include digitally comparing the immediate attributes of two entities with each other, without necessarily utilizing further attributes associated with the entities' connections to other entities. In some cases, this may not utilize all the available information to obtain an accurate match. Furthermore, existing entity matching techniques are largely queue-based, wherein top candidate matches are entered and stored in a queue. Such techniques have significant hardware and memory requirements, and may not scale well to large knowledge graphs having millions or even billions of digital entities.
It would thus be desirable to provide techniques for digital entity matching that capture not only localized but holistic characteristics of the knowledge graph entities, while implementing the required computations in a highly efficient manner.