As shown in FIG. 1, large-scale distributed storage systems provide networked online storage and allow multiple computing devices to store, access, and share files in the online storage. Distributed storage systems may use client/server architecture in which one or more central servers store files and provide file access to network clients.
Storage devices may include servers in a datacenter with flash, disk drives, and/or RAM memory and may provide varying levels of access to files. For example, some storage devices may provide files faster or may be more available than others.
Data files stored in large-scale distributed storage systems can be very large, potentially multi-gigabytes in size. In order to manage these files, files are commonly split into fixed-sized chunks, as illustrated in FIGS. 2a and 2b. Multiple data chunks may make up one data file. Code chunks are additional chunks computed from the data chunks using a mathematical formula. Normally, if the user reads data stored in a large-scale distributed storage system, the data is read from the data chunks, also called systematic chunks. However, if some data chunks are unavailable, the storage system may read the code chunks to recover the missing data chunks by a mathematical computation over the available data chunks and code chunks. The number of failed or unavailable data chunks that can be recovered depends on the number of code chunks available and the encoding system used. Although FIG. 2b depicts a file that is striped across chunks one byte or character at a time, usually a chunk is a contiguous portion of a file. For example, a contiguous portion may be the first 1 MB. Using a 4+2 encoding similar to that shown in FIG. 2b, a stipe might have 4 MB of file data with the data spread across chunks 0-3 with each chunk containing a megabyte of data. Two code chunks may be constructed by a mathematical operation on chunks 0-3. If the file is larger than 4 MB, a second stripe may be created which may also be up to 4 MB with two code chunks.
When storing files in large-scale distributed storage systems, chunks may be striped across multiple storage devices. Data striping is a technique of segmenting logically sequential files so that consecutive segments are stored on different physical storage devices. A stripe is one segment of a file written to one storage device. The size of each stripe, or chunk, may be configurable in order to allow each storage device to provide a maximum amount of data in the shortest amount of time. Stripe width is the number of parallel stripes that can be written to or read from concurrently. Striping is useful when a client device requests access to a file more quickly than a single storage device can provide the file. By striping file across multiple storage devices, a client device may be able to access multiple file segments simultaneously.
In addition to being chunked and striped across multiple storage devices, files in a large-scale distributed storage system may be encoded to correct errors that occur during file transmission or storage. For example, Reed-Solomon encoding may add extra “redundant” bits to files that can be used to recover the original file in the event that bits are lost during file transmission or storage.
Once a file is stored, clients can request the file from the large-scale distributed storage system. When a client requests to access a data file or part of a data file, a large-scale distributed storage system server may respond with the location of the appropriate file. In some large-scale systems, the client may acquire a lease on the file and hold the lease until the client releases the file. Although multiple reads may proceed simultaneously during the lease time, no other clients may be able to write. Furthermore, if many clients want to access to the same file at the same time, it may be desirable to replicate data in order to provide sufficient performance for more than one client to access a particular file chunk at a given time. Additionally or alternatively, files may be replicated across multiple storage devices in order to allow the file chunks to be available even if a storage device fails or becomes unavailable. The replication level of a particular chunk is the number of storage devices on which duplicate file data is stored. For example, a three-way replicated stripe would involve putting three chunks that are exact copies of each other on three different storage devices. It may be costly to store files in multiple systems across multiple datacenters. However, storing files in a single system or datacenter may cause problems if there is a system or datacenter outage and the file is unavailable.