1. The Field of the Invention
The present invention relates to data storage and backup solutions for archiving data. More particularly, embodiments of the invention relate to software, hardware, systems, and methods for providing an integrated archive and backup solution in a data storage system.
2. The Relevant Technology
The need for reliable backup and archiving of information is well known. Businesses are devoting large amounts of time and money toward information system resources that are devoted to providing backup and archive of information resident in computers and servers within their organizations that produce and rely upon digital information. The customers of the data storage industry are more frequently demanding that not only is their data properly backed up and archived but also that such data protection be done in a cost effective manner with a reduced cost per bit for stored data sets.
To address these demands, Content Addressed Storage (“CAS”) has been developed to provide a more cost effective approach to data backup and archiving in data storage and protection systems. CAS assigns an identifier to the data so that it can be accessed no matter where it is located. For example, a hash value may be assigned to each portion or subset of a data set that is to be data protected or backed up. Presently, CAS applications are provided in distributed or networked storage systems designed for CAS, and storage applications use CAS programming interface (API) or the like to store and locate CAS-based files in the distributed system or network.
The use of CAS enables data storage and protection systems to store, online, multi-year archives of data by removing storage of redundant data because complete copies of data sets do not have to be stored as long as that content is stored and available. When used for backup, CAS removes the challenges of maintaining a centralized backup index and also provides a high level of data integrity. CAS-based backup and archive applications have also improved the usage network and data storage resources with better distribution of data throughout a multi-node data storage system.
With CAS, the storage address for any data element or content is generated by an analysis of the contents of the data set itself. Since an exclusive storage address is generated for each unique data element (which is matched with a unique identifier) and the storage address points to the location for the data element, CAS-based architectures have found favor in the storage industry because they reduce the volume of data stored as each unique data object is stored only once within the data storage system.
While providing higher efficiency data storage, current CAS-based data storage systems are typically implemented in conjunction with separate backup and archive applications. Consequently, separate infrastructures are typically provided for the backup solution and the archiving solution, which may include a backup server backing up files to tape or other storage, and a separate archive for long term storage, as well as separate backup and archiving applications. Further, while backup data can be re-used for subsequent backups and archive data can be re-used for subsequent archiving, backup data cannot be re-used for subsequent archiving, often resulting in redundant data between the backups and archives. Accordingly, current data storage systems can be improved by integrating backup and archive solutions and re-using data between backups and archives.