Object stores may store objects using erasure codes. An object store, which may also be referred to as an object based storage system, may have multiple devices (e.g., disks) in multiple apparatus (e.g., servers) positioned at multiple locations (e.g., sites). An object store may be controlled with respect to where any given erasure code is stored or with respect to where any given collection of erasure codes are stored. Conventional object stores may use pre-defined or static configuration settings to statically determine erasure code locations and distributions. The control may be exercised when an erasure code is stored. For example, conventional object storage systems may use static, pre-defined settings to distribute erasure codes across a collection of disks, servers, racks, or sites when a store operation is performed. Unfortunately, by relying on static, pre-defined configuration values that are only consulted at store time, these conventional systems may not consider the actual conditions that are present at the time an erasure code is to be stored. Additionally, by relying on static, pre-defined configuration values that are consulted only when a store operation is performed, conventional systems may not consider relationships between erasure codes, usage patterns of erasure codes, capacity considerations, or other factors that may affect the efficiency of an object storage system.
Thus, conventional systems may be unable to dynamically recognize a distribution pattern or plan that could optimize or improve performance of an object storage system. For example, usage patterns that result in hot spots of high loads on certain disks, servers, or sites may go unnoticed and thus may not be addressed by conventional systems leading to sub-optimal load balancing between devices in the object store. Load balancing is just one characteristic that may affect the operation of an object storage system. Other characteristics include capacity balancing, fault tolerance, locality of usage affecting read performance, associations between objects affecting read performance, and other factors.
An erasure code is a forward error correction (FEC) code for the binary erasure channel. The FEC facilitates transforming a message of k symbols into a longer message with n symbols such that the original message can be recovered from a subset of the n symbols, k and n being integers, n>k. The original message may be, for example, a file. The fraction r=k/n is called the code rate, and the fraction k′/k, where k′ denotes the number of symbols required for recovery, is called the reception efficiency. Optimal erasure codes have the property that any k out of the n code word symbols are sufficient to recover the original message. Optimal codes may require extensive memory usage, CPU time, or other resources when n is large.
Erasure codes are described in coding theory. Coding theory is the study of the properties of codes and their fitness for a certain purpose (e.g., backing up files). Codes may be used for applications including, for example, data compression, cryptography, error-correction, and network coding. Coding theory involves data compression, which may also be referred to as source coding, and error correction, which may also be referred to as channel coding. Fountain codes are one type of erasure code.
Fountain codes have the property that a potentially limitless sequence of encoding symbols may be generated from a given set of source symbols in a manner that supports ideally recovering the original source symbols from any subset of the encoding symbols having a size equal to or larger than the number of source symbols. A fountain code may be optimal if the original k source symbols can be recovered from any k encoding symbols, k being an integer. Fountain codes may have efficient encoding and decoding algorithms that support recovering the original k source symbols from any k′ of the encoding symbols with high probability, where k′ is just slightly larger than k. A rateless erasure code is distinguished from an erasure code that exhibits a fixed code rate.
Object based storage systems may employ rateless erasure code technology (e.g., fountain codes) to provide a flexible level of data redundancy. The appropriate or even optimal level of data redundancy produced using a rateless erasure code system may depend, for example, on the number and type of devices available to the object based storage system. The actual level of redundancy achieved using a rateless erasure code system may depend, for example, on the difference between the number of readable redundancy blocks (e.g., erasure codes) written by the system and the number of redundancy blocks needed to reconstruct the original data. For example, if twenty redundancy blocks are written for an object and only eleven redundancy blocks are needed to reconstruct the object that was protected by generating and writing the redundancy blocks, then the object may be reconstructed even if nine of the redundancy blocks are damaged or otherwise unavailable.
Object based storage systems using rateless erasure code technology may facilitate storing erasure codes generated according to different redundancy policies (e.g., 7/3, 20/9, 20/2). A first type of redundancy policy may be referred to as an N/M redundancy policy where N total erasure codes are generated and the message can be regenerated using any N-M of the N total erasure codes, M and N being integers, M<N.
Since conditions in an object store can vary dynamically as, for example, devices experience heavier or lighter write loads, devices experience heavier or lighter read loads, erasure codes are added or deleted, erasure codes are accessed in patterns, or for other reasons, the operation of an object store may also vary dynamically. However, conventional systems may use pre-defined configuration settings to statically determine distribution, which may lead to sub-optimal or undesirable operations.