A wealth of information is available on the Internet, and particularly that segment of the Internet referred to generally as the World Wide Web. Despite vast improvements in search engines, finding the particular information that one is interested in can still be a challenging and time-consuming task. A variety of methods and techniques have been considered for organizing and indexing the vast number of documents that make up the World Wide Web.
For instance, one approach to organizing documents involves a person, or group of persons, manually assessing the content of documents and then sorting and ranking the documents based on some existing classification. One obvious problem with this approach is that the sheer number of documents to be analyzed, as well as the rate at which new documents are introduced and existing documents change, may require a number of persons analyzing the documents that is cost prohibitive. In addition, with large collections of documents (e.g., the World Wide Web), differences in opinion amongst those analyzing the documents will likely introduce inconsistencies with respect to the assessment of quality and relevance of the documents to a particular topic or category.
Another approach is to utilize so called software bots to “crawl” the Internet while automatically analyzing and categorizing documents based on one or more attributes of the documents. A software bot (sometimes referred to as a “web crawler”) is a program designed to perform an automated, repetitive process, such as analyzing a specific attribute of documents. For example, one popular method of ranking the importance or relevance of a document is to assess the number of links in other documents that link or “point to” the document of interest. Accordingly, it is presumed that a document that is frequently referenced by other documents is of high importance relative to a document that is less frequently referenced. Generally, utilizing a bot in this manner can greatly improve efficiency in terms of the time it takes to analyze a number of documents. However, it is not always the case that the results achieved by such bots with respect to a particular subset of documents will correspond with the ranking of the subset of documents provided by the content provider or publisher of the subset of documents.
For example, in certain cases, a particular website will present a subset of documents in a manner that indicates or suggests their relative level of importance. This may be done, for example, by providing a list of documents corresponding to the letters of the alphabet (e.g., an A-Z list) under a particular topic or category. A medical website may, for example, provide a list of alphabetically categorized documents associated with broad categories such as, diseases and conditions, or drugs, or treatment options. For each letter of the alphabet, one or more documents may be provided on a particular topic within the particular category. For instance, under the category for diseases and conditions, there may be presented links to documents on such topics as, arthritis, breast cancer, cholesterol, diabetes, and so on. Although the website may host a large number of documents related to a particular topic under a particular category (e.g., diabetes), the particular document provided under the A-Z list of a given category has been selected by the publisher of the content as being particularly relevant with respect to that topic and category. A software bot designed to analyze and rank documents based on the number of links “pointing to” a document will not pick up on a content provider's ranking of a document implied by the inclusion of that document in an A-Z list, or a similar ranking or categorization of documents.