1. Field of the Invention
The present invention pertains to the field of automated analysis of combinatorial structures. Specifically, the present invention involves automated measurement of diversity of combinatorial structures such as graphs, as can be used to model the Web.
2. Discussion of the Related Art
The World Wide Web can be viewed as an ecology with a rich and rapidly evolving set of relationships among its components. The variety, or diversity, of structures is an important aspect of ecological systems. Diversity is related to the range of capabilities, the adaptability and the overall complexity of the ecology. While appealing in concept, diversity is difficult to quantify and measure for combinatorial structures such as the Web, particularly without resorting to asymptotic limits that apply only to very large systems.
The World Wide Web consists of a rapidly growing and changing collection of pages. The relationships among these pages, including explicit hyperlinks, textual similarity, patterns of usage and overlapping authorship, form a rich, evolving structure.
These relationships are most directly useful for finding items relevant for some question, either by manually following links or through automated searches. However, the relationships can also be viewed from an ecological perspective. The ecological perspective considers the resulting large-scale structure and evolution of the Web, which results from the actions of many autonomous individuals with a variety of goals.
A variety of proposals for precise definitions of diversity, and the related concept of complexity, have been made for various types of structures. A formal and general definition is algorithmic complexity, the length of the shortest program that produces the structure. While algorithmic complexity has good formal properties, it applies only asymptotically to large structures, thus is unacceptable for small structures, and is not computable for individual structures. Another prior approach, described by B. A. Huberman and T. Hogg, “Complexity and Adaptation,” Physica, 22D:376-384, 1986, defined diversity in terms of the number of distinct component parts, which is readily computed. However, as defined, it applies only to trees, not the more general combinatorial structures needed to describe the Web. Approximate entropy is also easily computed and readily related to information theory measures asymptotically; but useful even for small sequences. However, it applies only to sequences, a very limited type of structure.
As is apparent from the above discussion, a need exists for a effective and easily computable measure of diversity for general combinatorial structures of any arbitrary size, such as graphs which can be used to model Web pages or groups of Web pages.