1. The Field of the Invention
The present invention relates to data storage and back-up solutions for archiving data and recovering data. More particularly, embodiments of the invention relate to software, hardware, systems, and methods for providing data protection in a manner that allows a user or customer to obtain a copy of information or stored data from a selected point of time and that provides for high efficiency data archiving and data portability.
2. The Relevant Technology
The need for reliable backup and archiving of information is well known. Businesses are devoting large amounts of time and money toward information system (IS) resources that are devoted to providing backup and archive of information resident in computers and servers within their organizations that produce and rely upon digital information. The customers of the data storage industry are more frequently demanding that not only is their data properly backed up but also that when needed, such as after a system failure that causes a loss of data, that the backed up data be accessible at a particular point in time. In other words, there is an increasing demand for almost continuous data protection that allows data to be restored back to its state at a particular moment in time, which is most commonly a point in time just before their computer or data storage system crashed or was lost. The demand for point-in-time data protection, though, must also be balanced against the demand for low data storage costs, and typically, high speed optical and disk storage systems are more expensive to use as archive storage than tape-based data storage systems.
Driven by this demand for point-in-time archives and the growth of data storage, new technologies have emerged that store multiple versions or points in time of the primary data on disk storage using high efficiency techniques. These techniques allow multiple copies of the data, i.e., a data set “N” having a particular size from different points in time (e.g., data sets N1, N2, N3—where the numbers 1, 2, 3 represent different points in time at which changes may have been made to data set N), to be stored in a way that consumes far less capacity in a disk or optical data storage device or system than simply storing the data in its native state. For example, a high efficiency disk storage system might store the data sets N, N1, N2, and N3 in less than the total size of the original data set, N, or at least, using less storage capacity than the sum of the sizes of the data sets N+N1+N2+N3.
Today, there are multiple software- or application-based approaches to storing copies of data in a highly efficient manner in order to provide point-in-time copies of data for backup, restore, and disaster recovery. These technologies include, but are not limited to, snapshots, file differencing techniques, content addressed storage systems, and systems that eliminate redundant data components that may be variable in size. While providing a more efficient method of archiving data, all of these systems use disk storage as their primary storage mechanism rather than less expensive tape media or tape storage systems.
Also, despite the existence of these high efficiency storage technologies, businesses still often need to store or move data onto alternate archive systems that may utilize removable tape media, optical storage, or other disk storage systems that may be less expensive or have different management attributes. In some cases, these archives are required to meet regulatory or other requirements. A problem with such archives is that they are often highly inefficient, e.g., with archiving involving expanding the data back into its original state for archive purposes (N+N1+N2+N3). Another problem with such archives is that the data is stored in such a fashion that it cannot be easily ported, restored, or managed in the future due to the proprietary nature of an implemented high efficiency storage methodology. For example, copying all of the volumes of a primary storage system using block-based snapshots to tape will yield a high efficiency dataset, but one that cannot be independently read or utilized without restoring the data to a system that matches the physical characteristics of the original hardware platform.
As a result, existing backup and archiving techniques do not meet the needs of data storage customers, and there is a continuing need for enhanced techniques for providing continuous or near continuous data protection. Such techniques preferably can be implemented using existing data storage hardware in a highly efficient manner but with enhanced portability.