Datacenter-scale storage systems have generally been developed and refined to work with “big data” applications, e.g., web search applications, genomic databases, or other massively data-intensive applications. These big data applications tend to issue very large, sequential input/output (I/O) operations to storage, e.g., on the order of 16 megabytes per I/O. Furthermore, big data applications tend to be relatively tolerant to data loss and data inconsistency. As a consequence, cloud storage techniques tend to be efficient at handling large sequential I/O operations at the cost of accepting some data loss and inconsistent state upon recovery from a crash.
On the other hand, traditional desktop/laptop applications such as Windows® or Unix® applications tend to issue relatively smaller I/O's, e.g., on the order of a few kilobytes and often to random physical storage locations. Furthermore, these traditional applications are often less tolerant of data loss and rely on stronger consistency guarantees in the event of a crash. To protect against data loss and ensure data consistency, these applications often need to flush their data from memory to storage in a specific order; this order guarantees that, in the event of a crash, the application can recover its persistent storage to a consistent state. Applications can flush data either by explicit application flush calls, or via a file system (e.g., new technology file system or “NTFS”) that flushes the writes on behalf of the application.
Generally, data flushes are performed synchronously, i.e., the application must wait until the data is explicitly flushed to storage before continuing with processing. In other words, the application blocks (waits) until a given data flush is complete. When a traditional application is deployed in an environment with high-performance storage resources (e.g., to the cloud), the expectation is often that the application will exhibit substantial improvements in performance. However, synchronous data flushes can significantly impede the ability of an application to leverage high-performance storage resources in parallel; in turn, this reduces application performance.