Large graph-based knowledge bases represent factual information about the world. For example, in a data graph, entities, such as people, places, things, concepts, etc., may be stored as nodes and the edges between nodes may indicate a relationship between the entities. The basic unit of such a data graph can be a triple that includes two nodes, or entities, and an edge. The triple is sometimes referred to a subject-predicate-object triple, with one node acting as the subject, the second node acting as the object, and the relationship acting as the predicate. Of course, a triple may include additional information, such as metadata about the entities and/or the relationship, in addition to identifying the subject, predicate, and object.
The number of nodes and edges in a semantic network can be large, and it may be difficult to understand entities at a higher level because the factual information represented by a triple is often fine-grained, for example representing marriage relationships, membership in a musical group, and other discrete facts. However, in many applications it is more useful to assign entities into collections that represent more general facts about the entity. For example, it may be more useful to know that someone is a father or a guitarist in a band rather than to know the fine-grained details of who the child of the person is or the exact album the guitarist played on. Collections are used extensively in search, data mining, ad targeting, recommendation systems, etc. However, creation of entity collections for graphs has been a manual process, which does not scale to large graphs.