A web forum is a web site that typically allows users of the web site to post information that is available to be viewed by other users of the web site. Web forums provide a vast amount of information on a wide range of topics. Many web forums are dedicated to a specific topic. Such a web forum may have many different discussion threads relating to the topic. A user of such a web forum can select a discussion thread and then participate in the discussion. Other web forums may have discussion threads relating to many topics that may be hierarchically organized. To participate in a discussion, a user of such a web forum first selects a topic and then selects the discussion thread of interest. A discussion thread is typically initiated when a person creates an initial message directed to a topic and posts the message as a new discussion thread. Other persons can read the initial message and post response (or reply) messages to the discussion thread. For example, the initial message may pose a question such as “Has anyone encountered a situation where the Acme software product aborts with error number 456?” Persons who want to participate in the discussion can post response messages such as “It happens to me all the time” or “I fixed the problem by reinstalling the software.” Discussion threads typically take the form of a tree structure as sequences of messages branch off into different paths. For example, three different persons can post a response message to the initial message, starting three branches, and other persons can post response messages to any one of those response messages to extend those branches.
Many search engine services, such as Google and Overture, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request (also referred to as a “query” ) that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by “crawling” the web (i.e., the World Wide Web) to identify the keywords of each web page. To crawl the web, a search engine service may use a list of base web pages to identify all web pages that are accessible through those base web pages. The keywords of any particular web page can be identified using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on. The search engine service may generate a relevance score to indicate how related the information of the web page may be to the search request. The search engine service then displays to the user links to those web pages in an order that is based on their relevance.
Search engine services, however, do not perform well when a crawling a web forum, for various reasons. One reason is that a typical web forum has many pages with very little informational content that is of interest to a user who is searching on a specific topic. For example, each posting page may have a link to a reply page without a quotation and a link to a reply page with a quotation. These reply pages, however, contain no additional informational content that would be of interest to a user that is not already on the posting page. Another reason is that many web forums prohibit unregistered users from accessing user profiles. As a result, if a crawler does not use cookies, all accesses to profile pages will actually access a login page, which has no informational content of interest. Because of these and other reasons, current search engine services perform slowly when crawling a web forum and spend a considerable amount of time accessing web pages with no informational content of interest.