Electronic data is stored in different types of systems. For example, files may be stored by file systems and objects may be stored by object storage systems. Different approaches may be used to protect files, objects and other computerized or electronic information. One approach includes storing redundant copies of a file, object, or other information. To insure data protection, different approaches for storing redundant copies of a file, portions of a file, an object, or other information have been employed. Erasure codes are one such approach.
An erasure code is a forward error correction (FEC) code for the binary erasure channel. The FEC facilitates transforming a message of k symbols into a longer message with n symbols such that the original message can be recovered from a subset of the n symbols, k and n being integers. The original message may be, for example, a file. The fraction r=k/n is called the code rate, and the fraction k′/k, where k′ denotes the number of symbols required for recovery, is called the reception efficiency. Optimal erasure codes have the property that any k out of the n code word symbols suffice to recover the original message. Optimal codes may require extensive memory usage, CPU time, or other resources when n is large. Parallel processing may be employed when n is large.
Erasure codes are described in coding theory. Coding theory is the study of the properties of codes and their fitness for a certain purpose (e.g., backing up files). Codes may be used for applications including, for example, data compression, cryptography, error-correction, and network coding. Coding theory involves data compression, which may also be referred to as source coding, and error correction, which may also be referred to as channel coding. Fountain codes are one type of erasure code.
Fountain codes have the property that a potentially limitless sequence of encoding symbols may be generated from a given set of source symbols in a manner that supports ideally recovering the original source symbols from any subset of the encoding symbols of size equal to or larger than the number of source symbols. Recovering the original source symbols may require acquiring fountain codes from where they are stored. When a large number of fountain codes need to be recovered, parallel processing may be employed. A fountain code may be optimal if the original k source symbols can be recovered from any k encoding symbols, k being an integer. Fountain codes may have efficient encoding and decoding algorithms that support recovering the original k source symbols from any k′ of the encoding symbols with high probability, where k′ is just slightly larger than k. A rateless code is distinguished from a code that exhibits a fixed code rate.
Object based storage systems may employ rateless erasure code technology (e.g., fountain codes) to provide a flexible level of data redundancy. The appropriate or even optimal level of data redundancy produced using a rateless erasure code system may depend, for example, on the value of the data. The actual level of redundancy achieved using a rateless erasure code system may depend, for example, on the difference between the number of readable redundancy blocks (e.g., erasure codes) written by the system and the number of redundancy blocks needed to reconstruct the original data. For example, if twenty redundancy blocks are written and only eleven redundancy blocks are needed to reconstruct the original data that was protected by writing the redundancy blocks, then the original data may be reconstructed even if nine of the redundancy blocks are damaged or otherwise unavailable.
When a component in a data storage system stops providing the desired storage (e.g., a hard disk drive (HDD) fails), then an object storage system based on erasure codes may be tasked with providing erasure codes from which items that had been stored on the failed drive can be rebuilt. When a component (e.g., HDD) in a storage system fails, a file(s) stored by the storage system may become unavailable. When erasure codes were generated for the file(s) and stored in an object store, it may be possible to rebuild the file(s). Ideally, the file(s) would be rebuilt as fast as possible to minimize disruptions due to the component failure. Rebuilding a file may involve collecting a number of erasure codes from an object store. An object store may have a number of servers and a number of storage devices (e.g., HDD, SSD). When a component like an HDD fails, it is likely that a large number of files may become unavailable and thus a large number of erasure codes may need to be acquired to rebuild the large number of files. Thus, once again, parallel processing may be performed.
A rebuild agent may be tasked with rebuilding unavailable files. The rebuild agent may need to acquire X out of Y erasure codes to recreate a file, X and Y being integers. If a file was broken into sub-blocks and erasure codes were generated for the sub-blocks, then the rebuild agent may need to acquire X out of Y erasure codes to recreate each sub-block for the file. Thus, if 10,000 files became unavailable, and if the files were sub-divided into 100 sub-blocks on average, then 1,000,000 sub-blocks may need to be rebuilt. If each sub-block was protected by a 20/11 policy, then 20,000,000 erasure codes may have been generated and at least 9,000,000 of those erasure codes may need to be acquired to rebuild the files. Clearly, parallel processing may mitigate time issues associated with acquiring at least 9,000,000 erasure codes and then regenerating messages from the at least 9,000,000 erasure codes.
Conventional object based storage systems based on rateless erasure codes (e.g., fountain codes) may have used a static parallel rebuild in an attempt to reduce the amount of time required to perform a rebuild. Unfortunately, the number of rebuild agents employed in conventional approaches may have been fixed. Additionally, the resources allocated for the fixed number of rebuild agents may also have been fixed. Thus, the value of parallel processing in conventional rebuilds may have been unnecessarily limited.