The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
A Non-Volatile Dual In-line Memory Module (NVDIMM) is a computer-readable memory having a series of dynamic random access memory integrated circuits that is byte addressable and can retain data even when electrical power is removed either from an unexpected power loss, system crash or from a normal system shutdown. With recent advances, it is expected that NVDIMM will provide large capacity, high speed, byte addressable non-volatile memories for computing platforms—hereafter referred as persistent memory (PMEM). It is expected PMEM modules may soon have the capacity of 128 gigabytes (GB) to 512 GB, and a platform with multiple PMEM modules can composite a few terabytes (TB) of non-volatile memories. Thus, in a future PMEM based virtualized computing platform, each guest virtual machine (VM) may have a large portion of the computing platform's PMEM, e.g., a TB. The guest VM with a virtualization of PMEM (vPMEM) may have large capacity, non-volatile fast access speed memories, suitable for performance critical usages such as databases, or mission critical usages, such as mail servers.
However, such PMEM based systems may also suffer from 1) hardware failure including the NVDIMM failure, 2) the endurance of the PMEM may fail one day, or 3) a virus or malicious software may delete and/or overwrite the PMFM data the guest VM has. Accordingly, regular data backup, such as using tape and/or additional hard disks, of the massive vPMEM data, and ensuring their availability to restore the system to a previous snapshot under one of the above situations, is important, to mitigate the impact of data lost for mission critical usage. Moreover, for many performance or mission critical applications, the backup needs to be live backup, with the applications remain available, since a full backup can take hours or even longer.
Existing data backup mechanisms are typically based on block input/output (I/O) based disk device, and/or require application specific solutions. Often, a built-in mechanism is employed to maintain enough knowledge to provide an integrated snapshot of the internal disk data, so that the backup data can be used to rebuild the system. For example, Microsoft® Exchange Server has a built-in mechanism to export its internal server disk data to external disk storage. As a further example, shadow copy is another technology used by native operating system (OS) to take a snap shot of its internal files or volumes when the OS is running. It requires OS built-in features. (Note that the integrated data snapshot is important, because the backup process may take couple hours or even longer, and the data during the backup process may vary a lot leading to a situation where the backup data (those going to backup (which are new), plus already backed-up data in tapes/disks (which are old)), forming an invalid combination. An alternative solution is to use multiple storage disks (from multiple nodes) to form a redundant array of inexpensive disk (RAID) system so that failure of a node can be restored based on the redundancy the RAID system uses. However, these existing solutions either require the application-specific functionality support (which is not widely available), or they require the system to use the slow disks with central processing unit (CPU) intervention to the input/output (I/O) path to construct the redundancy. None of them are application-agnostic, or desirable for the future high capacity, high speed, byte addressable PMEM based systems.