It is a problem both to safeguard data that is stored on a computer system and to restore all or portions of this data that are lost or corrupted. Many computer systems have no protection systems in place, and the loss of data from these computer systems is irrevocable. Other computer systems make use of attached data backup systems to store a copy of the data that is stored in the computer memory and updates thereto for eventual retrieval to restore data that is lost from or corrupted in the computer system memory. However, the use of these existing data backup systems is laborious and can be confusing to the casual user.
In information technology, backup refers to making copies of data so that these additional copies may be used to restore the original after a data loss event. These additional copies are typically called “backups.” Backups are useful primarily for two purposes. The first is to restore a computer to an operational state following a disaster (called “disaster recovery”). The second is to restore one or more files after they have been accidentally deleted or corrupted. Backups are typically that last line of defense against data loss and, consequently, the least granular and the least convenient to use.
Since a data backup system contains at least one copy of all data worth saving, the data storage requirements are considerable, which data storage requirements can be exacerbated by the method used to perform the data backup where change tracking is wasteful of memory. Organizing this storage space and managing the backup process is a complicated undertaking. A data repository model can be used to provide structure to the data storage device for the management of the data that is backed up. In the modern era of computing, there are many different types of data storage devices that are useful for making backups. There are also many different ways in which these data backup devices can be arranged to provide geographic redundancy, data security, and portability.
Before data is ever sent to its data backup storage location, it is selected, extracted, and manipulated. Many different techniques have been developed to optimize the backup procedure. These include optimizations for dealing with open files and live data sources as well as compression, encryption, and de-duplication, among others. Many organizations and individuals require that they have some confidence that the backup process is working as expected and work to define measurements and validation techniques to confirm the integrity of the backup process. It is also important to recognize the limitations and human factors involved in any backup scheme.
Due to a considerable overlap in technology, backups and data backup systems frequently are confused with archives and fault-tolerant systems. Backups differ from archives in the sense that archives are the primary copy of data and backups are a secondary copy of data. Data backup systems differ from fault-tolerant systems in the sense that data backup systems assume that a fault will cause a data loss event, and fault-tolerant systems assume a fault will not cause a data loss event.
Data Repository Models
Any backup strategy starts with the concept of a data repository. The backup data needs to be stored somehow and probably should be organized to a degree. It can be as simple as a manual process which uses a sheet of paper with a list of all backup tapes and the dates they were written or a more sophisticated automated setup with a computerized index, catalog, or relational database. Different repository models have different advantages. This is closely related to choosing a backup rotation scheme. The following paragraphs summarize the various existing backup models presently in use.
Unstructured
An unstructured repository may simply be a writeable media consisting of, for example, a stack of floppy disks or CD-R media with minimal information about what data from the computer system was backed up onto this writeable media and when the backup (s) occurred. This is the easiest backup method to implement but probably the least likely to achieve a high level of recoverability due to the dearth of indexing information that is associated with the data that is backed up.
Full + Incremental
A Full + Incremental data backup model aims to make storing several copies of the source data more feasible. At first, a full backup of all files from the computer system is taken. After that full backup is completed, an incremental backup of only the files that have changed since the previous full or incremental backup is taken. Restoring the whole computer system to a certain point in time requires locating not only the full backup taken previous to that certain point in time but also all the incremental backups taken between that full backup and the particular point in time to which the system is supposed to be restored. The full backup version of the data then is processed, using the set of incremental changes, to create a present view of the data as of that designated certain point in time. This data backup model offers a high level of security that selected data can be restored to its present state, and this data backup model can be used with removable media such as tapes and optical disks. The downside of this data backup process is dealing with a long series of incremental changes and the high storage requirements entailed in this data backup process, since a copy of every changed file in each incremental backup is stored in memory.
Full + Differential
A Full + Differential data backup model differs from a Full + Incremental data backup model in that, after the full backup is taken of all files on the computer system, each incremental backup of the files captures all files created or changed since the full backup, even though some may have been included in a previous partial backup. The advantage of this data backup model is that restoring the whole computer system to a certain point in time involves recovering only the last full backup and then overlaying it with the last differential backup.
Mirror + Reverse Incremental
A Mirror + Reverse Incremental data backup model is similar to a Full + Incremental data backup model. The difference is that, instead of an aging full data backup followed by a series of incremental data backups, this model offers a mirror that reflects the state of the computer system as of the last data backup and a history of reverse incremental data backups. One benefit of this data backup method is that it only requires an initial full data backup. Each incremental data backup is immediately applied to the mirror and the files they replace are moved to a reverse incremental backup. This data backup model is not suited to the use of removable media, since every data backup must be done in comparison to the data backup mirror version of the data. This process, when used to restore the whole computer system to a certain point in time, is also intensive in its use of memory.
Continuous Data Protection
This data backup model takes the data backup process a step further and, instead of scheduling periodic data backups, the data backup system immediately logs every change made on the computer system. This generally is done by saving byte or block-level differences rather than file-level differences. It differs from simple disk mirroring in that it enables a roll-back of the log and, thus, can restore an old image of data. Restoring the whole computer system to a certain point in time using this method requires that the original version of the data must be processed to incorporate every change recorded in each differential change to recreate the present version of the data.
Problems
In spite of all of these various methods of data backup, existing data backup systems (including both hardware and software) fail to ensure that the user can simply plug in to the computer system to “back-up” the data stored therein, and also enable recovery of a revision of a file from a point-in-time, and enable all of the hard disk(s) in the computer system to be restored to a point-in-time. Existing data backup systems fail to efficiently track and store the state of multiple file systems over time, while allowing for correct disk-level and file-level restoration, to a point-in-time, without storing a significant amount of redundant data. These data backup systems require the user to learn new technology, understand the file system of the computer system, learn how to schedule data backup sessions, and learn new controls that must be used for this new functionality. Furthermore, the restoration of lost files is difficult using these data backup systems.