Field
This non-provisional U.S. patent application relates generally to distributed cache data systems and more specifically to managing replication of such cached data in such systems.
Description of Related Art
In computing systems, a cache is a memory system or subsystem that transparently stores data so that future requests for that data can be served faster. As an example, many modern microprocessors incorporate an instruction cache holding a number of instructions; when the microprocessor executes a program loop where the same set of instructions are executed repeatedly, these instructions are fetched from the instruction cache, rather than from an external memory device at a performance penalty of an order of magnitude or more.
In other environments, such as where a computing system hosts multiple virtual machines under the control of a hypervisor, with each virtual machine running one or more applications, caching of objects stored on a network attached storage system can provide significant performance improvements. In some instances, records are cached and then written to the network attached storage system according to a “write back” algorithm. In the “write back” algorithm, the received record is written to the cache before being written to the network attached storage system. The cache system can then direct the writing of the record to the network attached storage system. In other instances, records are synchronously written to the cache and to the network attached storage system according to a “write through” algorithm, typically by writing to the network attached storage before writing to the cache.
When read commands are sent from the virtual machine to the network attached storage, it may be more efficient to read the records from the cache rather than from the network attached storage. While various write-through and write-back caching algorithms exist, caching and retrieving data quickly and accurately remains a challenge.
In some such systems, referred to herein as a distributed cache system, data cached in one computing system is copied to a second computing system, a process known as a replication due to the fact that a replica of the cached data is being created. Having a copy on another computing system provides advantages of alternative, potentially faster response times to future data requests as well as helping to protect against failure scenarios should the first computing system fail.
However, the advantages of replication can be lost when the replication occurs on the same physical machine. This can occur in the modern world of virtual machines that are oftentimes moved from one computing system to another sometimes without the user of the virtual machine even being aware it has happened. The advantages of data replication can also be lost when the replication occurs to a different physical machine that would be equally impacted by a fault affecting the machine from which the data was copied. For example, if both machines were in the same server rack then a power failure to that rack would affect both machines. As another example, if both machines were in the same data center and some disaster occurred at that data center then both machines would be affected. To date, avoiding such faults common to both machines has been dealt with by carefully setting policies for data replication based on knowledge of where the virtual machines are running and awareness of overlapping exposure to such faults. What is needed therefore is a way to ensure a virtual machine user's wishes regarding replication are still met despite the fluidity of movement of virtual machines between computing systems and without the user having to maintain knowledge of such exposure to overlapping faults.