A copyset is a set of machines used to store replicas of a same chunk of data. At least one copy of the data chunk will always be available so long as at least one machine of the copyset remains available. Current storage systems assign each stored data chunk to a random copyset. Consequently, if a certain number of machines become simultaneously unavailable, it is likely that all the machines of at least one copyset will be unavailable, and therefore the data chunks stored on that copyset will be unavailable.
It has been observed that limiting the number of copysets in use can reduce the probability that all machines of any one copyset become unavailable at any given time. Accordingly, in some systems, each data chunk is stored in one of the limited number of copysets instead of in a random copyset of machines. Due to the use of limited copysets, many data chunks will become unavailable if all machines of a copyset become unavailable. This consequence is considered acceptable as compared to the above-described systems, because the reduction in episodes of data unavailability typically outweighs the negligible increase in recovery costs in a case that a copyset becomes unavailable.
When using limited copysets, it may be costly to migrate data off a single machine of a copyset. Such migration may be necessary if the machine becomes unavailable or nears its storage capacity. However, the option to migrate data off a single machine to another machine within the same copyset is not usually available.
Storage using limited copysets also assumes replication of data chunks across members of the copyset. Some storage systems implement erasure coding in order to use storage more efficiently. Data may become unavailable in such systems whenever several machines are unavailable at the same time. Specifically, a data chunk is unavailable if all replicas of a data chunk are on stored on failed machines. A data chunk will also be unavailable if it is X+Y erasure-coded and machines storing Y+1 fragments are unavailable.
Systems are desired to address the shortcomings of limited copysets and to support the storage of erasure-coded data.