A critical component of computer systems is data storage. The data storage can be divided conceptually into an individual user's data storage, which is attached to the individual's computer, and network based data storage typically intended for multiple users.
One type of network based storage device is a disk array. The disk array includes a controller coupled to an array of disks. Typically, each of the disks of the disk array is hot swappable, which allows a disk to be replaced without turning off the disk array.
Often the network based storage must meet various performance requirements such as data access speed and data reliability. One way of providing high data access speed is to store data in stripes across multiple disks of the disk array. Each stripe includes multiple data blocks, each of which is stored on a separate disk. When the data is stored or read, each of the disks that holds a data block stores or retrieves the data block associated with it. In this way, the stripe of data is stored or retrieved in about the time it takes to store or retrieve one data block.
One way of providing high reliability is data replication. For the disk array, the data replication stores one or more additional copies of data on one or more separate disks. If one of the disks holding a copy of the data fails, the data is still accessible on at least one other disk. Further, because of the hot swappable feature of the disk array, a failed disk can be replaced without turning off the disk array. Once the failed disk has been replaced, the lost copy of the data can be restored.
Another way of providing the high reliability is erasure coding. Typically for the disk array, the erasure coding encodes one or more redundancy blocks for at least two data blocks stored on the device. Each of the data blocks and the one or more redundancy blocks is stored upon a separate disk of the disk array. If one of the data blocks is lost, it can be retrieved by decoding the lost data block from a redundancy block and the remaining data blocks.
As an alternative to the disk array, researchers have been exploring data storage on an array of independent storage devices which form a distributed storage system. Each of the independent storage devices includes a disk and a network adapter. A potential advantage of the array of independent storage devices includes an ability to locate each of the independent storage devices in separate physical locations. Another potential advantage of the array of independent storage devices is lower cost. The lower cost can result from utilization of commodity components to construct the array of independent storage devices.
What is needed is a method of writing erasure coded data in a distributed storage system.