Data is often stored in various tabular formats. Such data can relate to entities, such as people, places, things, concepts, etc., and the relationships between entities. For example, a music database may store data on artists and albums, including which artist released a particular album, and which label produced the album. One way to better understand the relationships between entities in the table is to store the data in graph format where entities are represented by nodes and relationships between entities are represented by edges between nodes. For example, the nodes Tom Cruise and Mission Impossible may be linked by the edges of acted in and/or stars in a data graph based on a tabular movie database. The basic unit of such a data graph can be a triple that includes two nodes, or entities, and an edge, or relationship. The triple is sometimes referred to a subject-predicate-object triple, with one node acting as the subject, the second node acting as the object, and the relationship acting as the predicate. Of course a triple may include additional information, such as metadata about the entities and/or the relationship, in addition to identifying the subject, predicate, and object.
Data in a database or other data store may be used to generate a data graph. The data graph may assign the entities in the data graph a particular identifier, unique to the data set. Many such datasets may exist from different sources. But while the data graphs from disparate sources may each include some of the same entities, the source graphs cannot be searched together because they are each in their own identifier space. In other words, the Tom Cruise entity in one data graph has a different identifier than the Tom Cruise entity in another data graph. Furthermore each source dataset may be associated with restrictions on use, such as license terms or confidentiality restrictions, which may complicate the creation of a combined graph when the combined graph is available for public use. Furthermore some source data graphs may be from untrusted or untested sources, which can potentially corrupt a combined graph.