The World Wide Web (or “Web”) contains a vast amount of information in the form of hyperlinked documents (e.g., web pages) loosely-organized and accessed through a data communication network (or “Internet”). One of the reasons for the virtually explosive growth in the number of hyperlinked documents on the Web is that just about anyone can upload hyperlinked documents, which can include links to other hyperlinked documents. The unstructured nature and sheer volume of web pages available via the Internet makes it difficult to efficiently find and navigate through related information while avoiding unrelated information.
One conventional way to cull information on a computer network (e.g., the Internet) is through use of a search engine. A user typically begins a search for relevant information using a search engine. A search engine attempts to return relevant information in response to a request from a user. This request usually comes in the form of a query (e.g., a set of words that are related to a desired topic). Search engines typically return a number of links to web pages, with a brief description of those pages. Because the vast number of pages on the Web, ensuring that the returned pages are relevant to the topic the user had in mind is a central problem in web searching. Possibly the simplest and most prevalent way of searching the web is to search for web pages which have a relation to, or containing, all or many of the words included in the query. Such a method is typically referred to as text-based searching. Text-based searching over the Web can be notoriously imprecise and several problems can arise in the process.
The process of searching the Internet for narrowly-defined relevant information is akin to finding a “needle” of relevant information in a “haystack” of all the possible information available through the data network. The efficiency of the search process is greatly dependant on the quality of the search. Often a large number of web pages match a user's query. Typically, presentation of query results are ranked according to a predefined method or criteria thereby directing a user to what is believed to be the most-relevant information first. Poor quality queries tend to misdirect the search process, interfere with ranking algorithms and generally, produce poorer search results. In the aggregate, inefficient Internet search methods tend to slow the data network, occupying web page servers with request for irrelevant web pages, and clogging data network paths with transmissions of irrelevant web page information.
As the size of the Internet continues to increase, it becomes increasingly more desirable to have innovative techniques for efficiently searching hyperlinked documents.