1. Technical Field
The present application relates generally to a system and method for providing distributed caching on a computer network and, more particularly, to a system and method for caching objects in a distributed system using a cost-based publish and subscribe paradigm, wherein a server computing node determines whether a given cache node should receive a cache update based on, e.g., the cost of sending the update.
2. Description of Related Art
Caching is a technique that is typically employed in many computer systems to improve performance. For example, in an object-oriented environment, caching an object can minimize the cost for fetching or materializing the object since it is only incurred once. Subsequent requests can be satisfied from the cache, a step which incurs significantly less overhead, thus resulting in improved performance overall.
A key problem associated with caching items is that of preventing each cache from supplying stale data. Cached data become stale whenever actual values have changed but cached copies of these values have not been updated to reflect the changes. Since it is most undesirable to supply stale data, caches typically purge stale data and request a fresh copy from the data source. This replenishment incurs the usual overhead that the employment of caches seeks to minimize.
Various conventional techniques have been implemented and proposed for maintaining updated caches on a global computer network such as the WWW (World Wide Web) (or xe2x80x9cWebxe2x80x9d) and the Internet. For instance, xe2x80x9clease-basedxe2x80x9d or xe2x80x9cpublish and subscribexe2x80x9d caching techniques have been proposed for both distributed file systems and Web caching. Generally, with these methods, a cache obtains a xe2x80x9cleasexe2x80x9d for an object, wherein the lease comprises a subscription of finite duration. After the lease for the object expires, the cache must renew its lease in order to continue receiving update messages for the object.
One disadvantage associated with lease-based caching is that performance degrades as the system scales. Indeed, with the implementation of such methods in networks with large numbers of servers, caches, and/or objects (such as the Web), the overhead due to update traffic becomes prohibitive. Accordingly, there is a need for improved subscription-based caching methods that may be employed in large-scale networks, resulting in less overhead than the conventional techniques.
The present invention is directed to a system and method for caching objects using a cost-based publish and subscribe paradigm, wherein a server computing node determines whether a given cache node should receive a cache update based on, e.g., the cost of sending the update.
In one aspect of the invention, a method for maintaining objects in a cache comprises the steps of issuing a subscription for an object, maintaining a metric for the object, and determining, based on the metric, whether a cache is to receive an update message associated with the object. An update message may comprise, for example, an updated copy of an object or an invalidation message to invalidate a cached copy of an object.
In another aspect of the invention, the metric is preferably correlated with one or more factors such as an importance factor of maintaining the cached copy of the object current, the cost of the sending the update message, and/or the estimated lifetime of the object.
In yet another aspect of the invention, the order in which the server transmits update messages to subscribing entities is based on the value of the metric of the associated object. Preferably, priority for sending update messages is accorded to those objects having greater metric values associated therewith. Further, the priority of a given update message may be dynamically modified if, for instance, the given update message has not been sent to a corresponding subscribing entity for a predetermined period of time.
In yet another aspect of the invention, a method for maintaining objects in a cache comprises the steps of issuing a subscription to a plurality of objects, maintaining a metric for each of the plurality of objects, wherein the metric is correlated with a validity level of the object, and sending a message to either update or invalidate a cached copy of a given object of the plurality of objects, if the metric associated with the given object meets a predefined threshold. This affords a reduction in the amount of update messages sent from the server to a given cache by allowing cached copies of object to be obsolete to a desired degree.
Advantageously, the caching techniques described herein provide a reduction in the number of update messages sent between servers and caches. Consequently, this reduction results in less overhead and better scalability as compared with the conventional methods.