There is an ever increasing demand for building larger storage systems, driven by primary data growth and by the advent of new workloads such as disk-based backup. Backups which were traditionally stored on tapes are now being stored on disk-based storage systems for better performance and cost effectiveness. Such backup systems have huge footprints often several times larger than traditional primary storage systems and yet are unable to meet the requirements of the biggest enterprise customers.
To ease the management of a large storage system, data may be stored in such a system as a single large collection such as a file system, rather than split into multiple small disjoint sets. Similarly, for deduplicated storage, a larger de-duplication domain provides for better compression. If there are multiple small de-duplication domains in the storage, then the de-dupe rates are low, leading to more space usage. Further, with multiple small de-duplication domains, it becomes difficult to decide how to assign data to different de-dupe domains.
Unfortunately, a large single collection is difficult to backup and restore if there is a failure in the storage system. A single large storage collection presents a potential loss of the entire collection whenever a part of the storage is damaged or compromised and the time required to recover the collection can become long because of the size of the collection. In many systems, the collection must be taken off line while even a small portion of the data is being recovered. A long recovery time can render the system unavailable for a long period during the recovery.
In addition, there is an increased likelihood for a part of a large storage system to fail, because there are many more devices in the large storage system. Further, with the data and metadata of the single large collection spread across these large number of storage devices, these partial hardware failures can cause corruption of the entire collection. In such a case, the entire collection may be lost or must be taken off line to repair the data or metadata.