Large data graphs store data and rules that describe knowledge about the data in a form that provides for deductive reasoning. Data graphs can be bipartite, meaning that the nodes in the graph can be divided into two disjoint sets, with all edges going between the two sets. Bipartite graphs can be generated for use in web usage mining, which extracts implicit knowledge from data that captures user interactions with the web. For example, a bipartite graph may be generated with one set of nodes representing people or organizations and the other set of nodes representing actions, interests, etc., relevant to the people or organizations. For instance, a bipartite graph may have advertisers in one set of nodes and queries in another set of nodes, where an advertiser is connected to a query when their ad has been shown in response to the query. The edges connecting the first set of nodes to the second set of nodes can be weighted, or each edge may be counted the same. Bipartite graphs are commonly lopsided, with many more nodes in one set than the other. Such graphs may also be very large. For example, the advertising graph may have billions of queries but only millions of advertisers.
Bipartite graphs can be used in data mining, for example to detect interesting patterns and connections between nodes of one type and nodes of another type, for example between users and other actors. Users or other actors may represent one set of nodes and items representing data collected from search records or application logs may be the second set of nodes. Actors in such a graph are connected to items that describe their web usage patterns. Often the items can be put into categories that generalize the actions. Determining which actors are similar to other actors, e.g., computing a similarity ranking between one actor and another actor, is a desirable tool for web mining. Targeted similarity, or determining which actors are similar to other actors within a category or a subset of categories, can also be important, especially for advertising, social media, retail, and the like, to personalize services in a more effective way. Similarity rankings can be used, for example, to identify a competing advertiser, to suggest related queries for an advertiser, to find people with similar interests, etc. Computing similarity rankings, however, is challenging in a large graph due to the sheer amount of data.