If a software error corrupts a data object, or if erroneous data updates the data object, a data protection administrator may restore the data object to a previous state that does not include the corrupted or erroneous data. A backup/restore application executes a backup operation either occasionally or continuously to enable this restoration, storing a copy of each desired data object state (such as the values of data and these values' embedding in a database's data structures) within dedicated backup files. When the data protection administrator decides to return the data object to a previous state, the data protection administrator specifies the desired previous state by identifying a desired point in time when the data object was in this state, and instructs the backup/restore application to execute a restore operation to restore a copy of the corresponding backup files for that state to the data object. When a backup/restore application creates an incremental backup file for a data object, the backup/restore application only backs up data that is new or changed in the data object since the backup/restore application created the most recent previous backup file. The backup/restore application identifies the most recently created backup file to enable the combination of the incremental backup file and the most recently created backup file, possibly along with other backup files created for the data object, into a full copy of the backed up data object.
De-duplicating can be a specialized data compression process for eliminating most identical copies of repeating data. In the deduplication process, unique chunks of data are identified and stored during analysis. As the analysis continues, other data chunks are compared to the already stored data chunks, and whenever a match occurs, the redundant data chunk is replaced with a small reference that points to the matching data chunk that is already stored. Given that the deduplication process may identify the same unique data chunk dozens, hundreds, or even thousands of times, the amount of data that needs to be stored can be greatly reduced. In some systems, data chunks are defined by physical layer constraints, while in other systems only complete files are compared, which is called single-instance storage. A data chunking routine can be an algorithm that passes a sliding window along data to identify more naturally occurring internal data boundaries. A sliding window can be a fixed length queue in which the oldest, or first, data entering the queue is processed first, and may be referred to as a first-in first-out queue.
A data object can be a collection or a group of information that is backed up as a unit, such as the information for a computer or a network of computers. A data object may be stored on a storage array, which is a disk storage system that includes multiple disk drives. Unlike a disk enclosure, a storage array has cache memory and advanced functionality, such as virtualization and Redundant Array of Independent Disks (RAID). A data protection administrator may manage a backup/restore application to create backups files of data objects and store the backup files of data objects on multiple storage arrays.