The present invention relates to cache data storage. More specifically, the invention relates to a method, system, and computer program product for optimization of cache data storage across a system of virtual machines.
With the rapid development of server workload virtualization, there exists a demand for effective caching in storage systems. Caching can reduce the latency of systems as well as increase the input and output operations per second. Caching is considered to be effective when items placed in the cache have a greater likelihood of access than data placed in persistent storage. Virtual machines cache data in their own operating system caching layer. This cached data is often stored both in the virtual machine as well as in a remote storage system. As a result, cached data, especially data cached on read requests, is almost always accessed from the cache of the virtual machine and is likely never accessed from the storage system cache.
While the virtualization of data centers offers increased support for applications, these data centers are restricted by a limited quantity of server memory. Specifically, as the number of virtual machines in a system increases, there is an increased pressure placed on shared storage arrays. In larger systems, while the individual memory of a virtual machine is typically smaller than the memory of a storage system, the total memory of all virtual machines in the system is greater than the memory of the storage system.
Virtual machines are unique in that each virtual machine accesses its own disk image data, and while a virtual machine may access common data blocks in the server storage system via de-duplication and cloning, it does not share its data across other virtual machines. Consequently, the storage system cache can end up thrashing; caching and evicting data before it is even accessed a second time. This thrashing is wasteful and possibly detrimental by evicting shared data to make room for unshared and unaccessed data.