The World Wide Web (WWW) is comprised of an expansive network of interconnected computers upon which businesses, governments, groups, and individuals throughout the world maintain inter-linked computer files known as web pages. The WWW contains billions of documents (e.g., web pages) that are identified by respective uniform resource locators (URLs). Users can navigate these web pages by means of computer software programs commonly known as Internet browsers. Internet search engines index WWW documents, rank them, and perform queries against them. Web crawlers are applications that download web pages and index the downloaded web pages (and respective URLs) according to a particular categorization scheme. Web crawlers are often utilized to populate the document indices upon which search engines rely.
Due to the vast number of WWW sites (which include collections of web pages) many web pages have a redundancy of information or share a strong likeness in either function or title. The vastness of the unstructured WWW can cause users to rely primarily on Internet search engines to retrieve information or to locate businesses. A typical search engine has an interface with a search window where the user enters an alphanumeric search expression or keywords. The search engine sifts through available web sites for the search terms, and returns the search of results in the form of web pages in, for example, HTML. Each search result comprises a list of individual entries that have been identified by the search engine as satisfying the search expression. Each entry or “hit” comprises a hyperlink that points to a Uniform Resource Locator (URL) location or web page.