The emergence of the Internet through the proliferation of networked computers and computerized devices has resulted in an exponential increase in the amount of information available for transfer and access. In August of 2005, an estimated 70 million websites containing 19.2 billion web pages were indexed by a popular Internet service website and an independent Internet research and data analysis company.
The evolution of the Internet into a global institution is due in large part to the introduction of an information system known as the World Wide Web (“the web”), a vast distributed database of documents known as “web pages.” The web, accessed via the Internet, is composed of a seemingly limitless number of web pages dispersed across millions of independent computer systems all over the world in no discernable organization or morphology.
The sheer amount of information available on the web makes searching for a specific piece of information a daunting task. Mechanisms such as directories and search engines have been developed to index and search the information available on the web and thereby provide a means for Internet users to locate information of interest. These search services enable consumers to search the Internet for a listing of web sites based on a specific topic, product, or service of interest.
Typically, inquiring users submit a short query request consisting of a few words to one or more search engines, and obtain a list of search results in terms of web pages located online. These queries, returned search results and subsequent user clicks on and within the search results are termed “click-throughs.” These click-throughs are often compiled into click-through logs which can be used to “mine” information about the queries and their respective associating websites.
The automated search technology that drives many current and traditional search engines rely in large part on complex database search algorithms that filter, select and rank web pages based on multiple criteria to determine “relevance,” such as keyword density, and keyword location.
However, the search results generated by such mechanisms often rely on morphology-blind mathematical formulas and may be random and irrelevant. Web searchers often face the difficult challenge of phrasing a query effectively to locate the desired information of interest. Too general a query and the resulting list of web pages may be unreasonably large. Too specific a query risks the elimination of any web page results. Moreover, at any time a query obtains search results, the search results generated by such mechanisms are returned content-ignorant and without regard to taxonomy.
Furthermore, search engines that use automated search technology to catalog search results generally rely on invisible web site descriptions, or “meta tags,” that are authored by web site promoters. Web site owners may freely tag their sites as they choose. Consequently, it is not uncommon for web site promoters to insert popular search terms into their web site meta tags which are inaccurate or irrelevant to attract additional consumer attention at little to no marginal cost. Consequently, this affects the usage of search engines by returning web sites with meta tags that correspond to a query, but do not in fact, contain any information pertinent or responsive to the query.
Finally, many web sites have similar meta tags, and current and traditional search engines are simply not equipped to incorporate human knowledge of the queries as well as their relationship to other content-related web pages. This problem will almost certainly worsen as more information and new web pages continue to be added to the web.