With increased penetration of the Internet and higher data access speeds, a variety of cloud applications/services are being offered and are becoming increasingly popular. Example such applications/services include, but are not limited to, web-based e-mails, social networking sites, news/financial portals, content sharing sites, cluster computing, payment gateways etc. Given the explosion in Internet users, these applications/services require huge data storage capacity for storing large amount of the customers' data, information associated with the customers (for example, customer logins, authentication information, customer preferences, customer created content, etc.) as well as metadata for cluster management or database catalog information. Scalable distributed database systems, such as, Yahoo! Sherpa, Amazon Dynamo, Google BigTable and the like, offer the massive storage space and processing power to facilitate the cloud applications/services.
To improve performance of the distributed database systems and to support the customers' spread over large geographical area, external caches are deployed. The caches store a local copy of data items that are frequently accessed by customers serviced by the caches, thereby decreasing query processing time and reducing latency and network traffic for accessing the data items from the underlying database.
It is very important to maintain consistency between the external caches as well as the underlying database to ensure proper operation of the applications/services and prevent loss of customer satisfaction. This is especially critical if different cache servers store different versions of system metadata, for example, metadata representing storage mapping. In this case, an application accessing an older version of the storage mapping may expose bad data and may lead to unexpected behavior of the application.
However, maintaining cache consistency is a big challenge. To ensure cache consistency, stale copies of the data stored in multiple cache servers need to be invalidated. One common technique uses trigger capability provided by many currently available database systems, such as, Structured Query Language (SQL) databases to invalidate stale data items. According to this technique, whenever a data item changes in the underlying database, the database sends a trigger to all caches notifying the caches of the change in the data item. The caches then invalidate respective local copies of the changed data item. However, not all database systems support such a trigger mechanism. According to another technique, each write operation on a data item is routed through only one cache that holds a copy of that data item. Therefore, any modification to the data item is known by the cache and the cache can invalidate the local copy of the data item. However, since all access operations for the data item pass through the single cache, this solution is not very scalable and may also increase access latency.