The Internet is a wide area network having a truly global reach, interconnecting computers all over the world. That portion of the Internet generally known as the World Wide Web is a collection of inter-related data whose magnitude is truly staggering. The content of the World Wide Web (sometimes referred to as “the Web”) includes, among other things, documents of the known HTML (Hyper-Text Mark-up Language) format which are transported through the Internet according to the known protocol, HTTP (Hyper-Text Transport Protocol).
The breadth and depth of the content of the Web is amazing and overwhelming to anyone hoping to find specific information therein. Accordingly, an extremely important component of the Web is a search engine. As used herein, a search engine is an interactive system for locating content relevant to one or more user-specified search terms, which collectively represent a search query. Through the known Common Gateway Interface (CGI), the Web can include content which is interactive, i.e., which is responsive to data specified by a human user of a computer connected to the Web. A search engine receives a search query of one or more search terms from the user and presents to the user a list of one or more references to documents which are determined to be relevant to the search query.
Search engines dramatically improve the efficiency with which users can locate desired information on the Web. As a result, search engines are one of the most commonly used resources of the Internet. An effective search engine can help a user locate very specific information within the billions of documents currently represented within the Web. The critical function and raison d'être of search engines is to identify the few most relevant results among the billions of available documents given a few search terms of a user's query and to do so in as little time as possible.
Generally, search engines maintain a database of records associating search terms with information resources on the Web. Search engines acquire information about the contents of the Web primarily in several common ways. The most common is generally known as crawling the Web and the second is by submission of such information by a provider of such information or by third-parties (i.e., neither a provider of the information nor the provider of the search engine). Another common way for search engines to acquire information about the content of the Web is for human editors to create indices of information based on their review.
To understand crawling, one must first understand that HTML documents can include references, commonly referred to as links, to other information. Anyone who has “clicked on” a portion of a document to cause display of a referenced document has activated such a link. Crawling the Web generally refers to an automated process by which documents referenced by one document are retrieved and analyzed and documents referred to by those documents are retrieved and analyzed and the retrieval and analysis are repeated recursively. Thus, an attempt is made to automatically traverse the entirety of the Web to catalog the entirety of the contents of the Web.
Since documents of the Web are constantly being added and/or modified and also because of the sheer immensity of the Web, no Web crawler has successfully cataloged the entirety of the Web. Accordingly, providers of Web content who wish to have their content included in search engine databases directly submit their content to providers of search engines. Other providers of content and/or services available through the Internet contract with operators of search engines to have their content regularly crawled and updated such that search results include current information. Some search engines, such as the search engine provided by Overture, Inc. of Pasadena, Calif. (http://www.overture.com) and described in U.S. Pat. No. 6,269,361 which is incorporated herein by reference, allow providers of Internet content and/or services to compose and submit brief title and descriptions to be associated with their content and/or services. Such a title, description, and an address to associated information are collectively referred to as a search listing. Search listings are typically returned as individual results corresponding to a received and processed search query. As the Internet has grown and commercial activity conducted through the Internet has also grown, some search engines have specialized in providing commercial search results presented separately from informational results with the added benefit of facilitating commercial transactions over the Internet.
Information regarding activity of a search engine is gathered for various purposes. Such purposes include both public and private purposes. As an example of a private purpose, a search engine provider can collect information on searching activity for evaluating such things as server resource requirements and public response to various aspects of search services provided. As an example of public purposes, a search engine provider may intend to publish information regarding numbers of searches performed in total, for various time periods, and for various search terms. Whether for internal auditing and evaluation of search engine performance or for external advertising of search engine popularity or for another purpose altogether, the accuracy of such collected information is paramount.
However, such information is generally intended to represent searching activity of human users in a genuine attempt to locate specific information held by the search engine. Search queries are frequently submitted for reasons other than a genuine attempt to locate information. For example, a provider of information might periodically search that information to see how such information is presented by a search engine. Sometimes, one or more parties might be interested in measuring immediacy of response of one or more search engines by submitting a number of search queries and timing the delay between submission and receipt of results. In addition, some parties might attempt to make a search listing appear more popular than it is by configuring a program to periodically submit search queries crafted to give such an appearance. All of these instances, and any others in which search queries are submitted for purposes other than location of information of interest, influence information of search engine activity to the extent any such information is intended to be representative of human searcher activity.
What is needed is a mechanism by which searching activity which is not the result of a genuine search for information a human user can be identified such that accuracy of information gathered pertaining to the activities of human searches is dramatically improved.