1. Field
At least one embodiment of the present invention pertains to saving storage space in a backup device by writing only blocks of data that have been updated or are new when backing up data from backup images.
2. Background
Electrically operated machines, such as general-purpose and special-purpose computing devices (e.g., “computers”), data storage devices or systems, network servers, file servers, and Internet servers typically include computer processors and other devices (often operating under the control of a processor) that frequently need to store information or data in, or retrieve information or data from a computer memory. For example, computing devices may include network servers, file servers, Internet servers, and/or client devices that store data in various types of memory (e.g., in mass storage devices, such as disk drives). These computing devices may include a data storage device having or providing access to a “storage subsystem” having a number of mass storage devices, such as disk drives, which may be in a disk array.
Data stored in computing devices may be harmed or become corrupt, such as, as a result of various factors or events including computer viruses, security breeches, user operator errors, hacker attacks, and the like. All of these can destroy data, and halt or “crash” the entire computing device. This may result in great loss of data, time, and/or money for the user.
In order to maximize protection and minimize such data harm and corruption, data backup and recovery technology for computing device storage and/or disk memories (e.g., for mass storage devices, physical memory, disk storage, and/or other data storage of computing devices) is often used. Using this technology, important data, some of the data, or all of the data stored in storage of a computing device (e.g., a “source device”) can be copied to a protected area and “recovered” (e.g., recalled and/or restored to the storage memory), when necessary. For example, backup data can be copied from a source device to a “backup device” such as a data storage device, a network server, file server, or Internet server to “backup” data, so that the backed up data can later be recovered by the source device. Such backup and recovery may involve other entities, such as other computing devices which may manipulate, select, or otherwise process the data transmitted between the source and backup device.
Like the source device, the data storage device backing up the data may include or provide access to various types of storage space and/or disk space “memory” to which the backup data may be written to and recovered from. The data storage device may include or provide access to a “storage subsystem” having a number of mass storage devices, such as disk drives, which may be in a disk array. The storage subsystem may be located locally or remotely from the data storage device. Reading data from the drives and writing data to the drives can be controlled by an operating system and use a random access memory (RAM) type “main memory”.
For example, a “full” backup may be performed by sending an original backup image from a source device, including all of the data stored (e.g., all of the directories and files) on the source device (e.g., a client, a server, or one or more other computing devices) that it is desired to backup at a point in time. Subsequently, an “incremental” backup may be performed by sending an incremental backup image from the source device, including all of the same data as the full backup, or a portion thereof, such as after a selected period of time, or at a second point in time after the first point in time. Although, according to some embodiments, an incremental backup image may include less than all or a portion of all of the data of the full backup image. In this case, a “full backup” and an “incremental backup” are two completely different types of backups. First, a full backup of all the data directories and files to be backed up for a source device is performed; then one or more incremental (or differential) backups are performed, based on the directories and files of the full backup, to update the backup data with updates to all or a portion of the directories and files of the full backup. However, this approach may not optimize disk space. For example, to store subsequent backup increments, backup writes to a backup device may require almost the same amount of additional disk space as the disk space used to save the original full backup of a set of data. In some cases such incremental backups may require an additional amount of disk space equal to more than 90% of the data disk space required for the original image. Similarly, a second full backup performed after the first full backup may require an additional amount of disk space equal to more than 90% of the data disk space required for the original full backup image. Thus, almost twice the disk space of the original backup may be required to save a full backup and an incremental backup from the source device (e.g., even in cases where the incremental backup stores only files updated or added since the first point in time).
It may be desirable to minimize or reduce the amount of disk space required to store backup data. It may also be desirable to minimize the amount of time, processing time, and computing resources required to store backup data.