1. Technical Field
The present disclosure relates to storage of data and, more specifically, to efficient storage of small random changes to data on one or more disks coupled to a host computer in a network environment.
2. Background Information
Many modern computing algorithms are page-based and implemented in a kernel of an operating system executing on a host computer. Paging is a memory management function that facilitates storage and retrieval of data in blocks or “pages” to and from primary storage, such as disk. For example, assume that an application executing on the host computer utilizes a page-based algorithm to, e.g., insert a new node into a doubly-linked list. Execution of the algorithm may result in a first modified (“dirtied”) page, i.e., the page with a previous pointer, a second dirtied page, i.e., the page with a next pointer, and a third dirtied page containing the newly inserted node. Modification of the pages requires a number of (e.g., three) random seek operations to retrieve the pages from the disk, as well as the same number of additional seek operations to write the modified pages back to the disk. It is thus desirable to utilize data structures on disk-based systems that avoid such random and expensive operations.
The advent of byte-addressable persistent memory, such as storage class memory, may accelerate adoption of primary storage to reside on a memory bus of the host computer, as well as acceptance of “in-memory” computing. Applications written for persistent (non-volatile) byte-addressable storage incur no penalty for random access and thus behave differently, e.g., they persist as directly byte-addressable linked-lists as described above. The persistent memory may be configured to enable applications executing on the host computer to safely and consistently modify (change) their data at a byte addressable granularity to, e.g., survive failures. That is, the applications may perform high-frequency, small random accesses to change the data in the persistent memory. Yet, even safe and consistent data stored in the persistent memory may be vulnerable in the event of a disaster because there is only a single copy of the data on the host computer.
Therefore, there is an economically advantageous need to replicate the changed data on one or more storage devices, such as disks, of remote machines connected to the host computer over a network to thereby allow recovery from a disaster. However, disks generally provide good streaming bandwidth performance (e.g., reading and writing of a large number of sequential blocks or “track reads”) but do not perform well on small random accesses (i.e., reading and writing a single disk sector preceded by a disk seek). In other words, disks operate most efficiently in sequential or streaming bandwidth mode, whereas small random accesses (such as the random seek operations described above) can substantially slow the performance of disks. Accordingly, there is a need to match the random access, byte-addressable capability of persistent memory on the host computer with the block-based, streaming bandwidth capability of disks.