Information that can never be found is neither valuable nor of great use. Useful information on a computer can be found by searching, which is a process of seeking a particular or specific piece of data, and is carried out by a program through comparison or calculation to determine whether a match to some pattern exists or whether some other criteria have been met. Much information is available to computers on the World Wide Web, which is the total set of interlinked hypertext documents residing on servers around the world.
Search engines can be used to search for and within documents on the World Wide Web. These documents called Web pages are written in HTML (hypertext mark-up language), identified by URLs (uniform resource locators) that specify the particular machine and path name by which the document file can be accessed, and are transmitted from server to end user via HTTP (hypertext transfer protocol). Codes, called tags, embedded in an HTML document associate particular words and images in the document with URLs so that a user can access another file, which may be on another server half way around the world, at the press of a key or the click of a mouse. These files may contain text (in a variety of fonts and styles), graphics, images, movie files, and sound, as well as Java applets, ActiveX controls, or other small embedded software programs that execute when the user activates them.
Through search engines, people use can present search requests (queries), which are formed from a data manipulation language for retrieving and displaying pieces of data from one or more databases. A search engine responds to a person's query by searching one or more databases and displaying one or more documents that match the query. Typically, a person uses a browser on a client computer to present a query, and a search engine uses a database on a server computer to respond to the query. Together, the client computer and server computer form a type of computer network architecture called a client/server architecture.
Client/server architecture is an arrangement that makes use of distributed intelligence, treating both the server and the individual workstations as intelligent, programmable devices, thus exploiting the full computing power of each. This is done by splitting the processing of an application, such as a search process, between two distinct components: a “front-end” client and a “back-end” server. The client component, itself a complete, stand alone personal computer (versus the “dumb” terminal found in older architectures), offers the user its full range of power and features for running applications. The server component, which can be another personal computer, minicomputer, or mainframe, enhances the client component by providing the traditional strengths offered by minicomputers and mainframes in a time-sharing environment, such as data storage, data management, information sharing among clients, and sophisticated network administration and security features.
The server component allows information on the World Wide Web to become useful because of the server component's storage and retrieval capabilities. The server component's disk drives and other storage media represent facilities for holding information on a permanent basis, allowing retrieval at a later time by either the server component or the client component. In the initial days of the World Wide Web, its intrepid early users found only limited information. Now, millions of users across the globe demand that companies provide continuous access to information that must be quickly retrievable at all times of day and night. Failure to meet these expectations means rapid expiration of users' patience, and with a click of a mouse button these users can visit a competitor's Web site.
Rapid retrieval of information is not effortless even with the blazing power and speed of today's databases because of the sheer size of stored information and the ever-growing number of its users. It is easy to find information when there are only a few pieces to look through—not so when there are millions. It is also easy to service the queries of only a few users but to satiate the desires of a global online population is much more difficult. The most knotty problem of all, however, is that each user tends to present a query that is not similar let alone identical, to the query of another user, making the optimization of retrieval performance difficult (i.e., if all queries were identical, a query result for one user could be immediately reused for all users). One solution, albeit an expensive one, is to add additional processing capacity to accommodate the increasing amount of information and the growing number of users, but this raises not only the costs of procuring equipment but also the costs of operating the equipment. A system 100 in FIG. 1 illustrates this problem as well as other problems in greater detail.
The system 100 includes multiple users 102A-102C using personal computers 103A-103C, each a representative of the client component, to access a database 126, which is representative of the server component. Three users 102A-102C are illustrated for brevity purposes and ease of discussion but these three users represent the continuously growing millions of users. Personal computers 103A-103C allow users 102A-102C to access online services offered by the database 126 via a network 122. The network 122 is a group of computers and associated devices that are connected by communication facilities and can range in size from only a few computers, printers, and other devices, to many large group of small and large computers, which can even be distributed over a vast geographic area.
Web browsers 104A-104C are software running on personal computers 103A-103C that let users 102A-102C view HTML documents and access files and software related to those documents on the database 126. Browsers 104A-104C include a number of tools for navigation, such as Back buttons 108A-108C, Forward buttons 110A-110C, and Home buttons 1112A-112C. These buttons are positioned on navigation bars 106A-106C. Rightward of these bars 106A-106C is the name of the Web page (“HOME”) being displayed. Web pages 114A-114C present find functions 116A-116C allowing users 102A-102C to search for desired information in the database 126. Text boxes 118A-118C are elements of dialog boxes or HTML forms in which users 102A-102C may enter text to form queries. When one of the users 102A-102C has entered the query into one of the text boxes 118A-118C, the user may press the Enter key of a keyboard (not shown) coupled to the personal computer 103A-103C or may select an OK button 102A-102C to present the query. This query is transmitted through the network 122 to be executed on the database 126 to obtain a query result containing a desired piece of information. The query result is then sent back to a user among users 102A-102C who have originated the query.
A better solution than the economically prohibitive solution of spending more money to buy more equipment is the use of a cache 124, which provides on-demand cache services. But the cache 124 offers only a partial answer. The cache 124 is a portion of data storage in the server component apart from the database 126 for temporarily holding information without having to access the database 126. Information that has either been recently read from or written to the database 126 can be held in the cache 124 so that a next query for the same information can be satisfied not by executing the query in the database 126 but by merely copying the information already in the cache 124. However, if the next query is directed to a different piece of information, the cache 124 will be bypassed, and the next query will have to be executed in the database 126 to find the desired information, hence eliminating the usefulness of the cache 124.
As an example, suppose that the user 102A issues a query to find “CASCADIA”. See text box 118A. This query is executed by the database 126 to form a query result. The database 126 returns the query result to the user 102A which can be displayed on the browser 104A. The query result of the query “CASCADIA” query is temporarily stored in the cache 124. Suppose that the user 102B now issues a query to find “OLYMPICS.” Because the query “OLYMPICS” is not at all similar or identical to the query “CASCADIA,” the query result for the query “CASCADIA,” which is stored in the cache 124 cannot be used to respond to the query “OLYMPICS.” Therefore, the query “OLYMPICS” must be executed in the database 126 to find a corresponding query result. The database 126 returns the query result to the user 102B by displaying the query result on the browser 104B. The query result for the query “OLYMPICS” as well as the query result for the query “CASCADIA” are now stored on the cache 124. As can be seen, the cache 124 is helpful only if the queries of users 102A, 102B are identical. Otherwise, the database 126 will have to be accessed anyway to find query results for queries not stored in the cache 124. As a final example, the user 102C issues a query “CASCADEA.” See text box 118C. Because the query “CASCADEA” is completely different from queries “CASCADIA” and “OLYMPICS,” the cache 124 has no query result that can be used to immediately respond to the query “CASCADEA.” Thus, once again, the database 126 must be accessed to find the query result for the query “CASCADEA.” Failure of the cache 124 to provide readied query results sets the retrieval problem back to square one.
Database searching is a problem, especially on the Internet, where many users are present, many pieces of information are stored, and many different searches are requested. Users get easily frustrated and impatient if their requests are not serviced within a short amount of time. Adding more database servers is not acceptable because of the prohibitive costs involved in procurement and maintenance. On-demand caching is a partial solution, but due to the wide variations among queries, the cache cannot contain readied results, and the databases must be accessed anyway. Moreover, certain queries may take so long a time to search that their performance will not be tolerated by users on the Internet.
While these problems and others discussed above are in the context of Internet searches, other database searches have similar if not identical problems when there are many users, many pieces of information, and many different queries. Without resolution to the problem of responding efficiently to users' queries, users may eventually no longer trust the system 100 to provide a desired computing experience that can reproduce stored pieces of information within a short period of time, and demand for the system 100 will diminish in the marketplace. Thus, there is a need for a system, method, and computer-readable medium for responding to queries while avoiding or reducing the foregoing and other problems associated with existing systems.