Traditionally, storage provided in a storage cluster (such as by using a redundant array of independent nodes, or RAIN) is made reliable against hardware failure either through replication of stored objects or erasure coding of stored objects. The former has the advantage that the same unique identifier can access the multiple replicas (using a journal and RAM-based indexing scheme, for example), but has the disadvantage of high bandwidth and storage overhead (depending upon the number of replicas desired, large objects can take up a significant amount of space). The latter enjoys the benefit of a smaller storage footprint and less overhead for similar level of protection against media failures, but suffers from the drawback that each segment of an erasure set is different content that must be separately identified in order to read the object or to reconstruct any lost segments. This identification can be especially problematic when a storage cluster is restarted. Erasure coding will also incur a higher processing overhead and lose its footprint advantage when storing small objects.
Thus, both techniques have disadvantages. Further, some prior art approaches applicable to erasure coding use a control database separate from the storage cluster in order to identify and track segments of a particular object; this approach is problematic because it introduces more overhead and calls into question the availability of this control database and whether or not it needs to be replicated. Also, even though under erasure coding an object can be reconstructed using a subset of the segments used to encode that object (e.g., if there had been a disk failure), it can be time consuming not only to identify which segments are no longer present, but also to locate the remaining segments.
Accordingly, improved techniques are desired for use with storage clusters in order to take advantage of the benefits of replication and erasure coding as well as to limit exposure after a hardware failure.