Existing techniques for leveraging flash storage devices in a virtualized environment generally involve using such devices as a host-side cache. With this approach, the hypervisor of a host system intercepts virtual machine (VM) I/O requests directed to virtual disks (VMDKs) residing on a shared storage device (e.g., a networked storage array) and stores data retrieved from the shared storage device in a portion of a local flash storage device referred to as a “flash cache.” When the hypervisor intercepts a read request for data that is already available in the flash cache, the hypervisor retrieves the requested data directly from the local flash storage device (rather than performing a roundtrip to/from the shared storage device), thereby reducing the I/O latency experienced by the VM.
While host-side flash caching works well for accelerating the I/O performance of individual VMs, this approach does not necessarily make most effective use of flash storage resources, particularly in terms of (1) maximizing overall flash storage utilization and (2) minimizing environment-wide operational costs. With respect to (1), the most significant advantage of flash storage over hard disk (HDD) based storage is low I/O latency and high TOPS; thus, it makes sense to measure flash storage utilization in terms of “I/O absorption rate” (i.e., the percentage of total I/O requests that are serviced from flash storage)—the higher the I/O absorption rate, the better the utilization of flash storage resources. However, when using flash storage as a host-side cache, there is no easy way to maximize I/O absorption rate on a global scale for a given flash storage device or group of devices. There are at least two reasons for this: first, the hypervisor of a host system generally allocates flash cache space in a static manner among VMs or VMDKs at the time of VM/VMDK configuration. As a result, the hypervisor cannot dynamically adjust cache allocations at runtime (in response to, e.g., changing VM workloads or VMDK access patterns) to ensure optimal utilization of flash cache space. Second, the caching algorithms that the hypervisor executes generally make cache admission/eviction decisions for a given VM or VMDK based on the I/O requests for that single VM/VMDK, rather than taking into account the I/O requests for all active VMs/VMDKs.
With respect to (2), host-side flash caching is typically performed with fine-grained cache lines (e.g., 4 KB or 8 KB) in order to maximize caching performance. This means that a relatively large amount of host system memory is needed to maintain cache metadata such as a mapping table, least-recently-used (LRU) list, hash table, and so on. Further, a relatively large number of CPU cycles and I/O operations are needed for cache lookup, eviction, page mapping, write on cache miss, etc. These high memory, CPU, and I/O requirements can significantly increase the costs for operating and maintaining a virtualized environment that utilizes host-side flash caching, which in turn may prevent many organizations from deploying flash storage in such environments on a large scale.