1. Technical Field
The present invention relates to data processing systems and, in particular, to distributed caching environments. Still more particularly, the present invention provides a method, apparatus, and program for minimizing invalid cache notification events in a distributed caching environment.
2. Description of Related Art
Using two or more computer systems that work together is referred to as “clustering.” Clustering generally refers to multiple servers that are linked together in order to handle variable workloads or to provide continued operation in the event one fails. Each computer may be a multiprocessor system itself. A cluster of servers provides fault tolerance and/or load balancing. If one server fails, one or more additional servers are still available. Load balancing distributes the workload over multiple systems.
A cache is used to speed up data transfer and may be either temporary or permanent. A server may cache information that is frequently accessed. Some content has a short lifespan. For example, many servers generate dynamic content, which is relevant for a short period of time. These cache entries are assigned a time to live (TTL), which is a time of expiration. Thus, a server must perform cache management to keep the cache entries fresh and relevant.
Often, servers in a cluster generate similar data. Therefore, a benefit exists for sharing cache information among clustered servers. In a distributed caching environment, notification events are passed between member servers to synchronize cache entries within the managed cluster's domain. These notification events are used to add or update information in remote caches or to invalidate current information in remote caches.
If the clocks in the member servers are not synchronized, an incoming event may be unexpectedly discarded. Discarded events can cause poor system-wide performance, because the expected cached information will not be found and must be regenerated. Data integrity problems can also occur due to server clock error by allowing cached objects to live longer than expected.
Typically, to solve this problems, customers attempt to have each server's clock synchronized either manually or through external methods. Add-on products that implement a time protocol, such as network time protocol (NTP) or digital time services (DTS), may also be used. However, in a heterogeneous network environment, these methods lack the product-embedded inter-enterprise solution. Tying the solution to the product isolates the distributed caching systems from the problems that arise when no external NTP or DTS external product is employed, enterprises use different external methods for synchronizing clocks, or additional caching nodes are inserted into the network from a new enterprise.
As an example of the problem in the prior art, consider a first server and a second server with synchronized clocks. A cache entry is created by the first server with a TTL of ten minutes. A cache notification event is sent from the first server to the second server. The second server will receive the cache notification event and create the cache entry with a correct TTL.
However, consider an instance where the clock of the first server is four minutes ahead of the clock of the second server. The second server will receive the cache notification event and create the cache entry; however, the cache entry will have only six minutes to live. Thus, the event will be discarded prematurely. This will result in poor system performance, because data must be regenerated sooner than expected.
Now, consider an instance where the clock of the first server is four minutes behind the clock of the second server. In this case, the event will live longer than expected in the second server. This will result in data integrity problems, because data may live longer than the relevancy of the data.