Expanding a seed set of web pages into a larger group of web pages is a common procedure performed in link-based analysis of websites. Although the seed expansion problem has been addressed by numerous researchers as an intermediate step of various graph-analytic analyses on the web, unfortunately existing techniques fail to provide any measure of the character of a web page or the character of the expanded group of web pages. For instance, the HITS algorithm, well-known in the field, used a search engine to generate a seed set, and then performed a fixed-depth neighborhood expansion in order to generate a larger set of pages upon which the HITS algorithm was employed. The general technique of the HITS algorithm has seen broad adoption, and is now a common technique for local link-based analysis. Variants of this technique have been employed in community finding, in finding similar pages, in pagerank, in trustrank, and in classification of web pages. More sophisticated expansions have been applied in the context of community discovery.
However, without any way to measure the character of a web page that may be included by expansion of the seed set into a group of web pages, it may be difficult to automatically understand the character of the group of web pages resulting from the expansion of the seed set. In the absence of such context, meaningful characterizations of groups of web pages may continue to elude automatic discovery. What is needed is a way to characterize the relationship of a web page to a group of web pages and to measure the strength of the characterization. Such a system and method should be able to provide a context for understanding the meaning of such a measure characterizing the web page.