In current Information Technology (IT) environments, duplicate copies of data may be made by data backup and data archiving. Data backup and data archiving may be performed by moving and/or copying data from an online storage tier (e.g., hard drives of client machines) to near-line or off-line storage. However, combining the two tasks may be difficult because data backup and data archiving may serve distinct but necessary functions.
The purpose of backup may be to provide protection and operational recoverability for client machines. For example, a backup application may take snapshots of active data periodically to generate backup images. Data from backup images may be used to restore a computer to an operational state following a disaster, or restore a number of files after they have been accidentally deleted or corrupted. Backup operations therefore protect active data that may be changing on a frequent basis. However, most backup images are retained only for a short period of time (e.g., a few days or a few weeks) as later backup images supersede previous versions. Thus, backup may be designed as a short-term insurance policy to facilitate disaster recovery.
The purpose of archiving may be to reduce storage usage on client machines by removing stale but historically important data to archives. Also, archives may be created to comply with legislation and good corporate governance practices. Archived data may be stored for a long period of time (e.g., years or decades). For example, an archive may be designed to provide ongoing access to decades of business information. Therefore, archived data may need to be maintained for a longer period of time than is required for backup data.
Another difficulty for combining data backup and data archiving may be volume of data for backup and archive. Generally, data needing to be archived may be only a small percentage of data needing to be backed up. For example, a backup image may be a snapshot of a hard drive or some folders of a hard drive, which may contain a lot of active data in addition to stale data. The active data may become obsolete in days or weeks. Only the stale data may need to be archived. Therefore, using backup data for archival purposes is generally not suitable due to the tremendous storage requirement of maintaining backup images for long periods of time. However, performing both archiving and backup operations results in double data movement and double storage requirement.
In view of the foregoing, it may be understood that there are significant problems and shortcomings associated with current data backup and data archiving technologies.