1. Field of the Invention
The present invention is related to an improved data processing system. Particular aspects relate to the World Wide Web, databases, and transaction processing systems. A more particular aspect is related to the caching of dynamic documents on the World Wide Web.
2. Related Art
The speed with which documents can be retrieved from the World Wide Web is one of the most important factors in how useful the Web is for transferring information and supporting electronic commerce. Web servers must be able to serve content to users quickly. Data delivered by Web servers can be divided into two categories:
(1) static data. This data is obtained from files stored on a computer. Static data can be served relatively quickly. A high-performance Web server running on a computer such as a single RS/6000 590 node can typically deliver several hundred files per second. PA1 (2) dynamic data. This data is obtained by executing programs at the time requests are made. Dynamic data is often expensive to create. In many cases, dynamic data is one to two orders of magnitude more expensive to obtain than static data. PA1 foo:foo.h foo.c PA1 cc-o foo foo.c PA1 would be placed in a makefile to indicate that "foo" depends on "foo.h" and "foo.c" . Any change to either "foo.h" or "foo.c" would cause "foo" to be recompiled using the command "cc -o foo foo.c" the next time the command "make foo" was issued. PA1 (a) If the two objects correspond to the same version and are therefore identical; PA1 (b) If the answer to (a) is no, which version is more current (i. e. was created later); and PA1 (c) If the answer to (a) is no, a quantitative indication of how different the objects are. PA1 (1) Both objects are current; or PA1 (2) At some time t in the past, both objects were current.
For Web sites containing a high percentage of dynamic data, dynamic data performance can be a bottleneck. Examples of sites containing a high percentage of dynamic data include electronic commerce sites using IBM's net.Commerce software such as the LL Bean Web site (www.llbean.com) and the IBM 1996 Olympics Web site.
One method to reduce the overhead of dynamic data is to store dynamic pages in a cache after they are created by a program (see "A Distributed Web Server and its Performance Analysis on Multiple Platforms" by Y. H. Liu, P. Dantzig, C. E. Wu, J. Challenger, L. M. Ni, Proceedings of the International Conference for Distributed Computing Systems, May 1996). That way, subsequent requests which need to access these pages can access the copy in the cache. A page only has to be calculated by a program once. The overhead for recalculating the same page multiple times in response to multiple requests is reduced or eliminated.
Caching cannot be applied to all dynamic Web pages. Some dynamic pages cause state changes to take place which must occur each time the pages are requested. Such pages cannot be cached.
For pages that can be cached, a need remains for a method of updating a cache when changes to underlying data which may affect the value of one or more Web pages occur. For example, dynamic Web pages are often constructed from databases. When the databases change, it may be extremely difficult to determine which cache objects have become obsolete as a result of database changes. The present invention provides a solution to this problem. The solution is quite general and can be used for other situations where one needs to know how changes to underlying data affect the values of objects.
Another problem is how to keep a set of one or more caches updated when the source of the underlying data and the caches are geographically distinct. The present invention has a solution to this problem which is relevant to proxy caching of both dynamic and static data.
A third problem is how to make a set of updates to one or more caches consistently so that all updates are made at once and no request received by the system sees a later view of the system with respect to the updates than a request received at a later time. The present invention has a solution to the consistency problem which is of relevance to proxy caching of both static and dynamic data. It is also of relevance to transaction systems which do not necessarily involve caches.
There are utilities known in the art for managing dependencies between files which are necessary to build computer programs. For example, a single program may be constructed from multiple source and object files. Tools have been developed to manage dependencies between source, object, and executable files. One of the best known utilities for managing such dependencies is the Unix make command (See e.g., IBM AIX Version 4 on-line manual pages). Utilities such as make require users to specify dependencies between files in a special file known as a makefile. For example, the following file dependency specification:
Utilities such as makefile have several limitations, including:
(1) Makefile only allows dependencies to be specified between files. It is not possible to specify dependencies between a file and something which is not a file. A need exists for a method which allows dependencies to be specified between objects which can be stored in caches and graph objects (which include underlying data) which cannot be cached.
(2) Using the makefile approach, all files are updated whenever they are found to be obsolete by the "make" command regardless of how obsolete the file may be. A need also exists for a method which doesn't require that obsolete objects always be updated; for example, so that an obsolete object which is only slightly out of date maybe retained in the cache.
(3) The makefile approach also only allows one version of a file to exist in the file system at a time. A need exists for a method which allows multiple versions of the same object to exist in the same cache concurrently.
(4) A need also exists for a method which provides a quantitative method for determining how obsolete an object is, something not provided by tools such as makefile.
(5) A need exists for a quantitative method for determining how similar two versions of the same object are, something also not provided by tools such as makefile.
(6) A need also exists for a method for retaining a consistent set of possibly obsolete objects, something not provided by tools such as makefile.
(7) A need exists for a method for concisely specifying dependencies between objects known as relational objects, which may be part of relational databases, something not provided by tools such as makefile.
The present invention addresses these needs.