1. Field of the Invention
This invention relates generally to the field of computer processors. More particularly, the invention relates to an apparatus and method for store durability and ordering in a persistent memory architecture.
2. Description of the Related Art
Big data analytics and cloud computing are necessitating the need for applications to process large amounts of data to service a growing number of users expecting real-time responses. Evidence of these trends is common at most vendors operating Internet data centers such as Google, Facebook, Amazon, Microsoft, etc. However, capacity limitations of memory (e.g., DRAM) and large random accesses overheads/latencies for storage (HDD, SSD) impose significant challenges to meet these new application requirements. Emerging “persistent memory” technologies such as Phase Change Memory offer desirable capabilities that can help address these application challenges. For example:                Higher capacities compared to DRAM, with same order of magnitude performance; and        Byte-addressability (as opposed to page/block addressability of Flash memory), allowing them to be attached to processor memory bus.        
With such persistent-memory architecture, system software (e.g., the operating system) and applications can access nonvolatile storage using regular load/store instructions, without incurring the overheads of traditional storage stacks (file systems, block storage, I/O stack, etc.). However, stores to persistent memory impose new challenges for software to enforce and reason about the “persistence” of stores, which was not relevant until now with volatile main memory. Specifically, there are a number of intermediate volatile buffers between the processor core and persistent memory (such as WB buffers, caches, fill-buffers, uncore/interconnect queues, memory controller write pending buffers, etc.), and a store operation is not truly persistent until the store data has reached some power-fail safe point at the persistent memory controller.
Existing processor ISAs and memory ordering allows software to enforce or reason about store visibility only at following levels:                Local visibility: Stores retired by a thread are visible to itself. This happens immediately.        Global visibility: Stores retired by a thread are visible to all cores in the system. This is enforced through the cache coherency states for WB stores and through memory/store FENCE instructions for special (WC, non-temporal) stores.        Non-Coherency visibility: Stores retired by a thread are visible to non-coherent domain (such as I/O). This is enforced by the use of CLFLUSH+FENCE instructions.        
In other words, software can only ensure that a set of stores are visible at the global ordering point (typically at the memory controller). At this point, we say that these stores are “accepted to memory.” However, in a system with persistent memory, software also needs the ability to guarantee and reason persistence and ordering of stores to persistent memory (e.g., database log updates, data or metadata updates in file-systems, etc.). This may be referred to as “persistence visibility” where stores retired by a thread have reached power-fail protected domain (i.e., have become durable) which could be the persistent device itself or some adjacent power-fail safe buffer that has enough residual energy to write to persistent device even in case of a power failure.