A variety of attempts have been made at organizing the vast amount of material (e.g., internet documents or web pages) available on the Internet in general, and the World Wide Web in particular. Many attempts at organizing internet documents are directed at making it easier to perform searches and for identifying the most relevant material. Although certain techniques for organizing internet documents have proven more successful than others, each method suffers from one or more flaws.
One approach to the problem involves organizing internet documents into a predetermined hierarchy to form a directory of content. Under this approach, a directory or hierarchical structure is created that includes several categories and perhaps several sub-categories based on subject matter. Next, one or more persons individually analyzes each internet document and assigns the document to one or more categories.
This general approach suffers from numerous problems. First, because this approach does not depend on automated analysis, the vast amount of material that requires analysis makes this approach expensive (in terms of man-hours) to implement. That is, employees of the enterprise providing the search service must spend significant amounts of time, at the employer's expense, analyzing and categorizing web pages. Furthermore, because the analysis is dependent upon human reasoning and the number of persons needed to perform the analysis is significant, there is a significant likelihood that inconsistencies will exist. For instance the interpretation by different persons analyzing the content is likely to differ, thereby resulting in inconsistent organization of the content. Moreover, as web authoring tools have improved, web-based content has become much more dynamic. Consequently, content must be frequently revisited and re-analyzed in order to maintain accurate categorization.
A second approach to the problem involves automating the task of organizing content by using software agents (referred to as bots) to analyze content (including metadata associated with each document.) Under this approach, a software agent referred to as a bot or web-crawler automatically performs an analysis of a large number of interact documents, and creates an index based on the analysis. The index is then used by a search engine to perform a “look-up” of those documents that include key words or phrases specified in a search query. The search results will generally be ordered or ranked based on document relevance, for example, measured as the number of times a key word is included in a document.
One problem with this approach is that content authors can easily manipulate search results by including metadata in an internet document. For instance, if enterprise A is a competitor of enterprise B, enterprise A can include as metadata the name of enterprise B in its internet documents. This will raise the level of significance of enterprise A's internet documents for searches that include as a keyword the name of enterprise B. For example, if a user performs a search for the name of enterprise B, an internet document of enterprise A is likely to be included in the search results, and possibly listed higher in order than an internet document for enterprise B. This ability to manipulate search results makes this method problematic.
In yet another approach to the problem, metadata manipulation is overcome by determining an internet document's relevance based on an analysis of incoming links directed to a particular internet document. For example, if the analysis of a large number of internet documents indicates that a particular internet document is the most frequently linked to document in the group, then there is an assumption that the internet document is the most relevant. Moreover, the weight given to each incoming link to a document might vary in accordance with the relevance of the document containing the link. That is, if a document that is deemed highly relevant includes a link to another document, that link may be given greater weight because of the high relevance of the document containing the link. This type of analysis is explained in greater detail in U.S. Pat. No. 6,285,999, entitled, “Method for Node Ranking in a Linked Database.”
One of the primary problems with this approach is that an internet document's relevance is determined not by the end-users (e.g., the readers) of the internet document, but by the authors of other internet documents. Consequently, only content authors have a “vote” in determining a document's relevance. Many users (e.g., readers) of Internet content either have no desire to be publishers of content, or do not have the technical savvy to publish content. In any case, these users are not provided a voice in determining the relevance of Internet content. Consequently, a better method of determining content relevance is desirable.