Demands for increased data storage speed, capacity, and general availability have been spurred on by a variety of services that are available to users over the Internet, such as photo and video sharing websites, e-mail providers that offer unlimited email storage space, and service providers that host virtual machines (VMs). The data storage demands of VM service providers are especially challenging due to the nature of the services that they offer, which typically include storing persistent images of the VMs, storing virtual disks accessed by the VMs, and storing various snapshots of the VMs/virtual disks to preserve their state at different points in time. Such storage demands are exacerbated when the VM service providers take measures to ensure the integrity of the data and the availability thereof to their customers, which is typically accomplished by redundantly storing the data across different storage devices, and, in some cases, storing copies of the data across data centers in different geographic locations.
In order to provide the aforementioned services to their customers, VM service providers must typically purchase and configure a large number of server devices (e.g., rack-mounted blade servers) and storage devices (e.g., Storage Area Network (SAN) systems). Such devices are commonly referred to as “enterprise devices,” which are expensive, complicated to configure, and require frequent maintenance. As a result, both the upfront and recurring overhead costs incurred by the VM service providers are quite high, which reduces profitability. Moreover, scaling enterprise devices to meet evolving customer demands can be difficult to accomplish, which may result in a disruption of the services that are expected by customers to be highly-available.
In an attempt to mitigate the foregoing problems associated with enterprise storage devices, many VM service providers have turned to renting storage space from data storage services provided over a network by a cloud services provider, such as the Simple Storage Service (S3) provided by Amazon.com. These storage services are desirable since they provide reliable access to a virtually unlimited amount of storage space with little or no upfront costs and eliminate the complexities of managing enterprise storage devices. One drawback, however, of using such storage services is that the data stored therein typically can only be accessed over an Internet connection. As a result, data transfers between the VM service provider and the storage service provider are relatively slow.
One attempt to alleviate this problem involves caching data on a local storage device maintained by the VM service provider and periodically “flushing” that data out to the storage service for backup. Unfortunately, most storage services require all data flushes to be atomic, i.e., all data involved in a data flush must be transmitted to and stored by the storage service, or the data flush is voided. This requirement is problematic when attempting to flush large files—such as snapshots of VMs, which can be gigabytes in even terabytes in size—since the slow and unreliable nature of Internet connections result in a significantly reduced rate at which the snapshots can be flushed-out to the storage service in comparison to locally-accessible enterprise storage devices. Consequently, there is an increase in the potential amount of data that will be lost in the event of a crash of the local storage device cache failure.