An organization typically protects its data by making one or more copies of it. The data to be protected is typically termed primary data, which generally includes production data. The copies of the primary data are typically called secondary copies, tertiary copies, etc. Primary data may include numerous data objects. A data object is any collection or grouping of bytes of data that can be viewed as one or more logical units (data objects include, e.g., files, email messages, database entries, etc.). A data object may be unique (i.e., there is only one instance of the data object in the primary data) or non-unique (i.e., there is more than one instance of the data object in the primary data). Conventional data protection techniques typically involve making a secondary copy of each data object in the primary data, regardless of whether it is unique or not. For example, if the primary data includes N instances of a certain data object, conventional data protection techniques would result in creating a secondary copy that also includes N instances of the data object. Conventional data protection techniques thus minimize the risk of data loss of non-unique data objects (because another instance of a non-unique data object can likely be restored), at the expense of creating secondary copies that are as large as the primary data.
In contrast, single instance storage techniques typically provide for secondary storage of a single instance of a given data object included in primary data. Such single instance storage techniques typically operate by comparing signatures or hashes of data objects in primary data against signatures or hashes of data objects already stored in secondary storage. If a signature or hash of a data object matches that of a previously stored data object, then the data object is not stored, and only a pointer or other reference to the previously stored data object is stored in its place. Such single instance storage techniques result in creating a secondary copy of the primary data that includes only one single instance of each data object in the primary data.
While such single instance storage techniques may be efficient in terms of minimizing the storage space used to store data objects, they do result in some danger in terms of overall data protection. For example, if there is a problem with the media on which the data object in secondary storage is stored, then it may be difficult (if not impossible) to recover the data object from the media. If an organization implementing such single instance storage techniques stores only a single instance of the data object in secondary storage, then the data object may also be unrecoverable from secondary storage across the organization.
The organization may attempt to mitigate the consequences associated with this risk by making other secondary and/or tertiary copies of single instanced data objects, such as copies on tape. However, it may slow and/or difficult to recover such secondary and/or tertiary copies from tape.
The need exists for systems and methods that overcome the above problems, as well as systems and methods that provide additional benefits. Overall, the examples herein of some prior or related systems and methods and their associated limitations are intended to be illustrative and not exclusive. Other limitations of existing or prior systems and methods will become apparent to those of skill in the art upon reading the following Detailed Description.