The invention relates to the field of computerized systems and methods using graphs that include nodes, directed links, and link weights. In particular, the invention relates to applications where node proximity measurements are desired.
The invention is particularly useful in the field of graph mining. This field commonly applies to Internet applications such as recommendation systems and blog analysis. The fields of neighborhood search, center-piece sub-graphs, and image caption are also implicated.
In Internet database applications, data may be stored in the form of a graph including nodes, links (also called edges), and link weights. This structure shows relationships between pieces of information. These relationships can reflect how users perceive data. For instance, it is commonly desired to present new information to users that might be related to information previously accessed or products previously purchased. The behavior of the current user and/or other users may be used to predict interest in new information. Predictions of such interest can come from proximity measurements of the underlying graph structure.
The graph may be embodied as a matrix data structure on a machine readable medium. Proximity may be measured using a random walk algorithm.
A related work in this field is H. Tong, C. Faloutsos, and J.-Y. Pan, “Random Walk with Restart: Fast Solutions and Applications,” Knowledge and information systems, an International Journal (KAIS) 2008 (“RWR paper”). This paper is incorporated by reference, and relates to matrix representations of graphs and using random walk with restart to measure proximity in such graphs. The paper proposes an improvement to the random walk algorithm, summarized in algorithm 3a shown in FIG. 13. Algorithm 3a includes a pre-compute stage and an online query stage. The pre-compute stage includes calculating a low rank approximation in accordance with algorithm 3b, shown in FIG. 13.