A critical component of computer systems is data storage. Data storage can be divided conceptually into an individual user's data storage, which is attached directly to the individual's computer, and network based data storage typically intended for multiple users.
One type of network based storage device is a disk array. Typically, the disk array includes at least one controller, memory (e.g., non-volatile memory), and an array of disks. The memory acts a cache for data that is to be written to the array of disks. The data is held in the memory until the controller has an opportunity to write the data to disk. Typically, components (e.g., the controller and the disks) of the disk array are hot swappable, which allows components to be replaced without turning off the disk array.
As an alternative to the disk array, researchers have been exploring data storage within a distributed storage system that includes an array of independent computing devices coupled together by a network. Each of the independent computing devices includes a processor, memory (e.g., non-volatile memory), and one or more disks. An advantage of the array of independent computing devices is lower cost. The lower cost can result from mass production of the independent computing devices as commodity items and from elimination of hot swappable features of the disk array. Another advantage is better scalability. The user can buy a few devices initially and add more devices as demand grows.
Replication and erasure coding have been explored as techniques for enhancing reliability for an array of independent computing devices. A replication technique employed by the array of independent computing devices replicates data blocks across a set of storage devices (e.g., three storage devices). This set is called the replica set for the data blocks. Erasure coding stores m data blocks and p parity blocks across a set of n storage devices, where n=m+p. For each set of m data blocks that is striped across a set of m storage devices, a set of p parity blocks is stored on a set of p storage devices.
The memory of each independent computing device may be employed to cache write data that is to be written to the disks of the independent computing device. For both replication and erasure coding this means that the memory of the independent storage devices that will store the data must be used for the write caching. It would be desirable to also be able to reliably use memory of other independent computing devices to cache the write data for replication and erasure coding.
For erasure coded data, there are additional problems. A full stripe of data must be received to efficiently make use of the memory since, if less than the full stripe of data is received, one or more missing data blocks must be read from disk in order to determine the new parity blocks and reading the missing data blocks takes significantly more time than placing data in the memory. Moreover, for erasure coded data, sometimes data blocks of a stripe may not be received together but will arrive over a relatively short period of time. It would be desirable to be able to efficiently cache such write data without having to read missing data blocks from disk.