Expanding a seed set of nodes into a larger community is a common procedure in link-based analysis. The problem involves a small, but cohesive seed set of nodes in a graph, such as, for example, web pages, which must be expanded to generate the enclosing node community, such as, for example, a web community or communities. Although the seed expansion problem has been addressed as an intermediate step in various graph-based analyses on the web, existing techniques appear to be inefficient and provide less than optimal results.
Several techniques proposed for seed set expansion include methods that use spectral embedding, maximum flow, and parametric flow individually. However, each of these methods used individually appear to provide inadequate results. The spectral embedding methods, for example, result in an outer boundary that is approximate and inexact. The maximum flow methods grow a large candidate set and then shrink back to obtain a minimum cut, but may shrink back too much and thus obtain no expansion in the case of a small seed set. The parametric flow methods may produce quotient cuts that result in expansion sets unrelated to the seed set, but which happen to have low quotient scores.
Thus, what is needed is a system and method to identify target node graphs from predetermined seed node subsets that may yield accurate boundaries and can effectively grow the seed set to obtain related and accurate expansion sets.