As the number of computer applications and services continues to increase, the amount of electronic content, applications and services used by individuals, enterprises, and the like also continues to rise significantly. Moreover, continuing advances in storage technology provide significant amounts of digital data to be stored cheaply and efficiently. However, this means that significant amounts of data can be lost in the event of a failure or catastrophe. Accordingly, data backup of original data is a critical component of computer-based systems. The original data typically resides on a hard drive, or on an array of hard drives, but may also reside on other forms of storage media, such as solid state memory. Data backups are critical for several reasons, including disaster recovery, restoring data lost due to storage media failure, recovering accidentally deleted data, and repairing corrupted data resulting from malfunctioning or malicious software.
Typically, it can take a significant amount of time, from several hours to several days, for data backup systems to perform an initial “full backup”. This is because the amount of information can generally be hundreds of gigabytes to several terabytes. Therefore, many data backup systems will use incremental data backup in between when full backups of the target data are generated. Typically, incremental backups back up only the changed data, and more particularly, only back up the data that has changed since the last backup, whether it was a full or incremental backup.
FIG. 1 illustrates a conventional methodology of data archive where the data backup scheme alternates between full backups and one or more incremental backups between each full backup. As shown, the system can initial perform a full backup (e.g., the “1st day”) and subsequently perform a number of incremental backups (e.g., “2nd day” through “7th day”) before again performing another full backup (e.g., the “8th day”). The scheme shown in FIG. 1 provides highly reliable storage for two reasons. The scheme provide immutability of data in the archive since the full backup is written once and the data cannot be changed until the next full backup while the changed data is added incrementally. Second, the scheme periodically provides full data backups (e.g., every 8th day), which effectively minimizes the likelihood of errors. Moreover, even if error occurs during creation of one of the incremental backups, the next full backup will check validity of data on the source, and a corrected version of data will be recorded into the archive.
However, the data backup scheme shown in FIG. 1 also has disadvantages. For example, this scheme is very time consuming since the immutable data must be written to archive time and time again, i.e., every time full backup is performed regardless of whether data is changed or not. Moreover, when such schemes are used for large organizations, for example, the data backup is scheduled for a so-called backup window that is usually in low load hours (off-peek) of equipment and networks. However, since the process can be very time consuming a full backup usually does not fit within allocated backup window. On the other hand, incremental backups are made relatively quickly (usually—a few minutes) and, therefore, amortization of the dedicated backup window is very small, maybe a few percent. Thus, the rest of the time is wasted, although the company during a given backup window is forced to suspend some processes, temporarily stop the services, and the like.
To address some of the issues encountered by the scheme shown in FIG. 1, many organizations may use a full incremental backup scheme as shown in FIGS. 2A through 2C. For example, using this scheme as shown in FIG. 2A, a full backup is performed only once and all subsequent backups are made incrementally. Thus, to restore the data from such archive, the system must access the initial full backup as well as the whole chain of incremental backups.
In practice, the full incremental data backup scheme shown in FIG. 2A scheme often gradually develops to a point where a portion of the data (e.g., data from the initial full backup) becomes unnecessary since such data may be removed from the source medium. As a result, certain sectors or spaces related to the initial full backup may be freed, so the subsequent incremental backups part of the newly added data will be “inside” the initial full backup. For example, as shown in FIG. 2B, the third incremental data backup may be able to occupy a space of the full backup that has otherwise become vacant space as containing unused and/or removed data.
FIG. 2C illustrates an even more complicated variation of the incremental data backup scheme shown in FIG. 2B. For example, in certain circumstances, data from initial or prior incremental backups may also become partly unnecessary so the newly copied data can occupy the space of the unused incremental backup in the archive (e.g., the 4th incremental backup shown in FIG. 2C. As a result, these type of data archive scheme result in data archive that becomes less and less reliable since the data continues to be partially deleted and overwritten (i.e., broken immutability) and regular re-backup are not performed.