Backup data may contain data that may be harmful to a computer system such as viruses, worms, or other malware. Backup data may also contain sensitive data such as personnel or human resources data, trade secret data, company proprietary data, medical data, and/or attorney-client privileged data. Bad data, such as malware, as well as sensitive data may be intermixed on a single backup with ordinary data. Also, backups may be handled and/or classified on a per unit basis, such as a backup image. Additionally, sensitive or bad data may be a small portion of a backup and it may be difficult to detect. As a result, an entire backup may have to be restricted if sensitive data is contained on it or an entire backup may have to be quarantined if malware is contained on it.
Furthermore, data may need to be retained for different periods depending on content. A myriad of policies may require data to be retained for personnel reasons, tax reasons, Sarbanes-Oxley requirements, compliance with legal discovery requests, and/or other legal retention policies. Data with different retention policies may be located on a same server or a same storage unit targeted for a single backup. A small portion of data legally required to be retained for a long period of time may be difficult to detect when interspersed with a large amount of data with a shorter retention period. Certain portions of data may have multiple retention periods. For example, data associated with a contract may be retained according to one policy for financial reasons and according to a second policy for litigation associated with the contract. This may result in an entire backup being duplicated and/or being retained for the longer of the two retention periods. In some cases, a backup may be classified for a first purpose such as litigation related retention, and may be overlooked for a second purpose such as human resources related retention.
Moreover, backup capacity is increasing. The ability to store a large amount of data on a single backup not only increases the possibility of bad data and/or sensitive data being interspersed with ordinary data, but also increases the possibility that more backups will contain garbage data, such as music, pictures, games or other data which may be installed by a user but of no value to an entity performing the backup. Similarly, shareware and unlicensed or expired software may be contained on a volume slated for backup and may be interspersed with valuable organization data.
Current backup technologies and procedures may enable only the classification and handling of backup data at a backup unit level, such as for the entire backup image. This may result in the propagation of bad data, the loss of valuable data, the misclassification of data, and the use of excess storage space for bad data and/or duplication of entire backup images for a small portion of the backup data. Additionally, when bad data is copied, archived, restored and/or replicated, the costs of storage may quickly increase.
In view of the foregoing, it may be understood that there may be significant problems and shortcomings associated with current backup processing and handling technologies.