Virtual desktops enable the same infrastructure to serve many users. A stateless virtual desktop infrastructure (VDI) offers improved density by allowing any user to connect to any virtual machine (VM) desktop. For example, this allows shift workers (e.g., task workers) to share a common set of VMs, thereby reducing the set of VMs that need to be serviced by infrastructure to only what is needed to serve concurrent users rather than all existing users. In certain environments, such as hospitals that run 24 hours, these gains may reduce the VMs needed to approximately one-third of what might otherwise be needed to serve three shifts.
However, stateless desktops are less effective for users with more complicated processing needs. For example, knowledge workers tend to have long running sessions that must survive across many connect and disconnects. Knowledge workers often open many applications, work on multiple documents simultaneously, and tend to have applications positioned across the various screens in ways that require some effort to configure at login. Such users prefer to disconnect while the applications are running, and resume those applications at a later time (e.g., even days or weeks later) without ever logging out. However, stateless designs do not exhibit nearly the same efficiency for knowledge workers (as compared to task workers) because many idle VMs accrue as knowledge workers initiate sessions and later disconnect.
Some existing systems suspend all data associated with the VM, but for knowledge workers who may have larger memory allocations (e.g., 2 GB to 4 GB), there is much data to be moved back and forth between random access memory (RAM) and the storage system. As such, the existing systems are very input/output (I/O) intensive, potentially flooding already saturated storage resources that often struggle to deliver I/O in support of a quality VDI user experience. As an example, an ESX host from VMware, Inc. may host 100 VDI sessions of 4 GB each. If a suspend on disconnect policy is in place, and most of the users disconnect near the end of their work day, there is 400 GB of data needing to be written from RAM in the hypervisor to the storage system to prepare the ESX host for a new set of incoming users. In a larger cluster of perhaps 8 ESX hosts, a traditional shared array might be subject to 400 GB×8, or 3.2 TB of data flowing from the cluster as a result of such a policy. This I/O surge of writes takes a long time to complete and poses a substantial risk to experience of any remaining users. The surge may also inhibit the ability of the VDI cluster to properly serve users attempting to initiate new VDI sessions during this window of time.
In addition to massive “write-storms”, there is a severe challenge when users with suspended VMs return later to access their VMs. Upon logging in, users must wait for their VM to be reanimated. The wait time depends on retrieval of their machine state from the storage system. The transfer of a 2 GB to 4 GB of session data back into RAM takes time, as much as several minutes depending on the performance of the storage system at the time of the request. If many users need to retrieve their machines within a narrow time window such as around the beginning of the workday, the storage system is subject to large “read-storms” that further amplify the delays users experience and further degrade the I/O performance for users attempting to work.
Some existing solutions focus on use of flash technologies to ensure a more reliable and faster resume path than was available using hard drive technology, but these approaches do not significantly reduce the quantity of data involved and are thus, at the very least, inefficient.