A critical component of computer systems is data storage. The data storage can be divided conceptually into an individual user's data storage, which is attached to the individual's computer, and network based data storage typically intended for multiple users.
One type of network based storage device is a disk array. The disk array includes at least one controller coupled to an array of disks. Typically, each of the disks of the disk array is hot swappable, which allows a disk to be replaced without turning off the disk array.
Often the network based storage must meet various performance requirements such as data reliability. One way of providing high reliability is data replication. For a disk array employing data replication, one or more additional copies of data are stored on one or more separate disks. If one of the disks holding a copy of the data fails, the data is still accessible on at least one other disk. Further, because of the hot swappable feature of the disk array, a failed disk can be replaced without turning off the disk array. Once the failed disk has been replaced, the lost copy of the data can be restored.
As an alternative to the disk array, researchers have been exploring replicated data storage across a plurality of independent storage devices. Each of the independent storage devices includes a CPU and one or more disks. A potential advantage of the plurality of independent storage devices includes an ability to locate each of the independent storage devices in separate physical locations. Another potential advantage of the array of independent storage devices is lower cost. The lower cost can result from mass production of the independent storage devices as commodity devices and from elimination of the hot swappable feature of the disk array.
In “FAB: Enterprise storage systems on a shoestring,” Proc. of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, May 18, 2003, Frolund et al. teach methods of writing and reading replicated data stored across a plurality of independent storage devices. The method of writing the data includes two phases of communication (i.e., two rounds of communication) between a coordinator and a plurality of storage devices. In a pre-write phase (i.e., the first phase), the storage devices recognize a new ongoing write and promise not to accept an earlier write request. In a write phase (i.e., the second phase), the storage devices actually write the data. The method of reading the data takes place in a single phase provided that a majority of the storage devices indicate that they hold a consistent version of the data.
Since the method of reading the data takes place in a single phase, it operates efficiently when a workload is read intensive. When a workload for the plurality of storage devices is write intensive, it would be desirable to write the data in a single phase of communication while maintaining consistency of the data stored across the storage devices. Further, it would be desirable to be able to read the data that has been written with the single phase of communication in a way that maintains the consistency of the data stored across the storage devices.