The present invention generally relates to data storage systems and, but not by way of limitation, to data storage systems that store information on removable media.
Conventional backup involves of a series of full, incremental or differential backups that saves multiple copies of identical or slowly changing data. This approach to backup leads to a high level of data redundancy.
For years, there has been a considerable disparity between the prices of tape and disk-based storage systems with tape-based storage being less expensive. Therefore, conventional data storage solutions have been tape based storage systems that compress data using conventional algorithms for an average compression ratio of about 2:1. Advantageously, tape-based storage systems use removable tape cartridges that can be taken to off-site location for disaster recovery. However, the process of recovering data in a tape based storage system is slow, complex and unreliable.
Data de-duplication, known as commonality factoring, is a process of reducing storage needs by eliminating redundant data. Data de-duplication is a disk-based data storage system that greatly reduces disk space requirements. However, disk-based data storage systems including de-duplication methods are not easily exported to removable media. In order to export de-duplicated data to removable media, the de-duplicated data has to be first reformulated to its original format and then be recorded on removable tape cartridges, thereby, requiring more storage space than the de-duplicated version.
Data de-duplication is a resource intensive process, which is implemented in software as part of the commonality factoring solutions. Due to the intensive computational process, top of the line multi-core/multi-processor servers are used to provide adequate performance to perform the de-duplication process. The amount of performance gained by the use of multi-core/multi-processor servers depends on the algorithms used and their implementation in software. However, the overall cost and power consumption of these multi-core/multi-processor servers are high.