Secondary storage for enterprise market needs to offer several features simultaneously to be successful. Large capacity, high performance and high reliability form core functionality required by backup and archival appliances. Since backup involves storage of multiple versions of similar data, deduplication is a logical new key feature of such systems. With deduplication, logical storage capacity is far larger than physical space available resulting in substantial savings. If deduplication is performed on backup streams on-the-fly, writing of duplicated blocks into a storage device can be avoided, which contributes to high performance. Further, as calculation required for deduplication is scaled by distributed processing, higher performance can be achieved, resulting in shortened backup windows which is of primary importance to enterprise customers.
A storage system disclosed in NPL 1 is a commercial, distributed storage system delivering all the features mentioned above. In brief, this system can be seen as a distributed storage system keeping a collection of blocks having variable lengths and being capable of referring to other blocks. The system uses content-derived block addresses and provides global in-line block deduplication. The storage system disclosed in NPL 1 is built on a DHT (Distributed Hash Table), supports self-recovery from failures, uses erasure codes to generate redundancy codes (parity) of data, and provides multiple user-selectable data resiliency levels.