Many typical applications executing in computing clusters, including cloud computing clusters, require a high level of availability, redundancy, or other measures of robustness. In such applications, state data is typically propagated throughout the computing cluster to prevent introducing a single node as a point of failure. For example, business-critical applications such as sales and customer billing systems typically must be failsafe against a single point of failure. A node in a computing cluster may be brought down due to any combination of hardware failure, software failure, network failure, power failure, or other unplanned outage. However, software failures (including software bugs, software misconfigurations, crashes due to transient hardware errors or power failures, and all other software failures) are typically more common than any other failure source.
In some high-availability systems, application state may be propagated through a computing cluster through synchronous update messages sent between all of the nodes of the cluster. Additionally or alternatively, in some systems the application state may be synchronously logged to global or shared storage such as a storage area network or network attached storage volume. In such applications, synchronization between nodes and/or shared storage may limit application performance.
Some computing systems include persistent memory, which may be byte-addressable, high-performance, non-volatile memory. Persistent memory may provide performance comparable to traditional volatile random access memory (RAM) while also providing data persistence. In some applications, persistent memory may allow for durable data updates within a node without waiting for storage input/output (I/O) actions against local storage devices and without converting data from in-memory formats to formats suitable for on-disk storage. However, high-availability applications using persistent memory may still require synchronous updates to other nodes and/or shared storage.