Embodiments of the invention relate to buffering and data replication and, in particular, buffering and replicating data in a distributed file system.
Numerous workloads, such as virtual machines (VMs), databases, and accesses to user home directories, send small and synchronous write operations to storage. In addition, many small writes to a file system actually translate into many more small writes to the storage layer to update the recovery log and various metadata structures. Storage controllers typically use non-volatile read and write memory (NVRAM) to buffer these small writes and reduce their latency, but many systems, for example, systems based on a software-defined storage architecture, cannot install such expensive storage devices in every node. This is a particular problem for spinning disk-only based systems because of their poor performance for such operations. While storing data on large numbers of solid-state drives (SSDs) in every server can improve the situation, it is very costly given the typical capacity requirements in modern data centers. In addition, naively using SSDs to buffer random writes can severely degrade the lifetime of SSDs.