If a software error corrupts a computer's data set, or if erroneous data updates the data set, a data protection administrator may restore the data set to a previous state that does not include the corrupted or erroneous data. A backup application executes a backup operation either occasionally or continuously to enable this restoration, storing a copy of each desired data set state (such as the values of its data and these values' embedding in data structures) within dedicated backup files. When the data protection administrator decides to return the data set to a previous state, the data protection administrator specifies the desired previous state by identifying a desired point in time when the data set was in this state, and instructs the backup application to execute a restore operation to restore a copy of the corresponding backup file(s) for that state to the data set.
A backup file can be a copy of part or all of a data set, and can be used to restore part or all of the data set to the condition of the data set at the point in time that the copy was created. A full backup file represents an entire data set at the point in time that the full backup file was created. As a data set increases in size, a full backup file requires more time to be created and requires more storage space. Therefore, a database administrator can supplement a full backup file of a large data set with a series of incremental backup files, or differential backup files, each of which can be a copy of the modifications to a data set since the most recent copy of the entire data set or since the most recent copy of modifications to the data set. If the backup/restore application identifies that the most recently created backup file for a data set is not a full backup file, the backup/restore application can combine the most recently created backup file for the data set with other backup files created for the data set into a synthetic full copy of the backed-up dataset. For example, after a backup/restore application creates a full backup file of a hard disk's entire data set, a user modifies data blocks in only one of the hard disk sectors, and the backup/restore application subsequently creates an incremental backup file of the hard disk's modified sector. Then the backup/restore application combines the current incremental backup file of the hard disk's modified sector with the previous full backup file of the hard disk to create a synthetic full backup file that includes the hard disk's recently modified sector as a replacement for the previous version of the modified sector and also includes the rest of the hard disk's previous data set.
De-duplicating can be a specialized data compression process used by a backup/restore application for eliminating most identical copies of repeating data. In deduplication process, unique blocks of data are identified and stored during analysis. As the analysis continues, other data blocks are compared to the already stored data blocks, and whenever a match occurs, the redundant data block is replaced in the backup file with a small reference that points to the matching data block that is already stored. Given that the deduplication process may identify the same unique data block dozens, hundreds, or even thousands of times, the amount of data that needs to be stored can be greatly reduced.
A data set can be a collection or a group of information that is backed up as a unit, such as the information for a computer or a network of computers. A data set may be stored on a storage array, which is a disk storage system that includes multiple disk drives. Unlike a disk enclosure, a storage array has cache memory and advanced functionality, such as virtualization and Redundant Array of Independent Disks (RAID). A data protection administrator may manage a backup/restore application to create backups files of data sets and store the backup files of data sets on one or more storage arrays.
A virtual machine can be a software implementation of a computer, and executes programs like a physical computer. A system virtual machine provides a complete system platform which supports the execution of a complete operating system, and usually emulates an existing architecture. Multiple instances of virtual machines lead to more efficient use of computing resources, both in terms of energy consumption and cost effectiveness, known as hardware virtualization, the key to a cloud computing environment. A virtual machine typically includes a virtual disk, which may be stored in file formats for virtual disks such as VHD, VHDx, and VMDK. A virtual disk can be a software component that emulates a physical storage device. A disk can be a data storage device. A volume can be a single accessible storage area with a file system, typically resident on a single partition of a disk. A cluster is the smallest logical unit of disk space, such as one or more disk sectors, that can be allocated for storing files and/or directories. To reduce the overhead of managing on-disk data structures, a file system allocates an extent of contiguous clusters, each of which is a group of one or more disk sectors, instead of allocating individual disk sectors by default. An extent can be any number of consecutive logical units of disk space. A file extent can be a number of consecutive logical units of disk space used to store a collection of information under a single identifying name. Similar to other data sets, backup/restore applications make copies of a virtual machine's data set and store these copies as backup files that enable the backup/restore application to restore the virtual machine's data set in the event of corruption or an erroneous update to the virtual machine's data set.