Virtualized computing systems may comprise an arrangement of clustered compute nodes (servers) and data nodes for efficient, simultaneous execution of multiple applications. In particular, compute nodes, having direct or remote storage, may be used to implement virtual machines that execute various applications simultaneously. For example, an application, such as a database or a business application, can span one or more virtual machines. Networked data nodes within a data plane, including data node storage, may couple to the compute nodes, cooperating with a storage virtualizer to provide other storage options. Periodically, data can be flushed to backend data storage, which may be implemented in at least two places: either local to the compute nodes or remotely in the data plane. The virtualized computing system may include Hard Disk Drives (HDDs) and Solid State Drives (SSDs) local to the compute nodes. These drives may be organized into volumes, which contain the entire content of a virtual disk; wherein the virtual disk represents a unit of the virtualized storage associated with an application.
During execution of an application, data may be asynchronously flushed to two places. First, it may be written to the HDD and SSD, which is slower than writing to the cache. In the alternative, it may be written to the data node, which comprises longer-term storage, primarily used for analytics and various other needs. While application writes to the cache may be very fast, the application is limited by the performance of the one or more HDDs and SSDs making up the back end. Particularly, since the compute nodes generally require involvement of a processor in order to perform data flushing, performance associated with applications can be severely limited by data flushing. That is, since there is only a certain amount of cache space available, when the cache fills up, there will be a need to get rid of older data. When this older data is sent to the slower storage at the back end, the application's performance is limited by the speed of the slower storage. Ultimately, when the application pushes a large amount of writes from virtual machines to data storage, the application is limited by the speed of these backend HDDs and SSDs, rather than by the speed of the cache. Accordingly, although the storage cache unit may comprise high-performance flash drives, the slower performance of the HDDs and SSDs may impede the processing of applications during a data flush. Further since application writes are random, the writes captured in the log are also randomly distributed. This random nature of the data further impedes the performance of an application during a data flush, since flushing results in random writes to the backend HDDs, which are much faster at sequential writes than random ones. It is within this context that the embodiments arise.