Currently, backup of a physical server and backup of a virtual machine tend to be volume (that is, disk)-based backup. The volume-based backup includes full backup and incremental backup.
When a volume in a to-be-backed-up virtual machine is backed up to a server for a first time, full backup is performed to back up all data in the target volume to the server. In a second time of backup and subsequent backup, differential backup is performed to back up changed data in the target volume to the server.
For convenience of data restoration, a storage format of to-be-backed-up data is usually fixed-length raw volume block data (the to-be-backed-up data is divided into multiple pieces of block data with equal lengths). Both the virtual machine and a volume on the virtual machine are generated on a virtualization platform. The server interacts with the virtualization platform, obtains an original differential bitmap of the target volume on the to-be-backed-up virtual machine from the virtualization platform, and obtains volume data from the virtualization platform according to the original differential bitmap. The server generates, according to the volume data, a volume mapping file corresponding to the volume data, and saves the volume data to corresponding locations as multiple fixed-length volume files according to the volume mapping file. The original differential bitmap is calculated by the virtualization platform using a technology such as changed block tracking (CBT).
An implementation manner of backing up data on the to-be-backed-up virtual machine to the server based on a fixed-length volume data file is performed as following.
It is assumed that a fixed length of a volume on the to-be-backed-up virtual machine is 16 megabytes (MB), each piece of volume data includes four fixed-length data blocks with a fixed length of 4 MB, and each fixed-length data block includes four pieces of data with a granularity of 1 MB. Volume data of a first target volume (storage addresses are 0 to 16) on the to-be-backed-up virtual machine is shown in FIG. 1.
When the server backs up the first target volume on the to-be-backed-up virtual machine for a first time, the first time of backup is full backup.
The server interacts with the virtualization platform, and obtains an original differential bitmap of the first target volume on the to-be-backed-up virtual machine from the virtualization platform. The obtained original differential bitmap is shown in FIG. 2. In the original differential bitmap, “1” is used to identify data in a storage area corresponding to an address in the first target volume as valid data, and “0” is used to identify data in a storage area corresponding to an address in the first target volume as invalid data. The valid data is defined as: The storage area corresponding to the address stores data or data in the storage area corresponding to the address changes. The invalid data is defined as: No data is stored in the storage area corresponding to the address or stored data does not change.
After obtaining the original differential bitmap of the first target volume, the server re-calculates the original differential bitmap to generate a 4-bit bitmap. One bit occupies 4-MB storage space, and the re-generated bitmap is shown in FIG. 3. In the re-generated bitmap, “1” is used to identify that in the original bitmap of the first target volume, a value corresponding to at least one address in an address segment is 1, and this indicates that data stored in the address segment needs to be obtained. In the re-generated bitmap, “0” is used to identify that in the original bitmap of the first target volume, values corresponding to all addresses in an address segment are 0, and this indicates that data stored in the address segment does not need to be obtained.
The server sequentially obtains, from the virtualization platform in ascending order of addresses according to the re-calculated bitmap, volume backup data in locations corresponding to address segments with values of 1 in FIG. 3, divides the data in the first target volume into four fixed-length volume files, saves the four fixed-length volume files, and generates a volume mapping file corresponding to the first target volume. In this case, a size of the backed-up data is 16 MB. In one time of backup, one fixed-length data block is corresponding to one fixed-length volume file, a fixed length of each fixed-length volume file is 4 MB (that is, a size of one fixed-length data block), and the fixed-length volume file is used to identify data that needs to be stored in current backup. Each target volume is corresponding to one volume mapping file, each fixed-length volume file is corresponding to one element in one volume mapping file, each volume mapping file is corresponding to one address segment, and the fixed-length volume file and the volume mapping file are associated using a name of the fixed-length volume file. The saved fixed-length volume files are shown in FIG. 4. F_1_Snap_1 identifies this file as a first fixed-length volume file in the first time of backup. The volume mapping file is shown in FIG. 5. F_1_Snap_1 identifies that storage addresses of the first fixed-length volume file in the first time of backup are 0 to 3.
When the server backs up the first target volume on the to-be-backed-up virtual machine for a second time or subsequently, the second time of backup may be incremental backup.
It is assumed that in this case, the volume data of the first target volume changes, and the server performs incremental backup on the first target volume on the to-be-backed-up virtual machine. The volume data of the first target volume is shown in FIG. 6. Addresses in shadow regions in FIG. 6 identify that data stored in the addresses changes.
The server interacts with the virtualization platform, and obtains, from the virtualization platform, an original differential bitmap of the first target volume on the to-be-backed-up virtual machine relative to latest backup. In this case, the obtained original differential bitmap is shown in FIG. 7. In the original differential bitmap, “1” is used to identify data in a storage area corresponding to an address in the first target volume as valid data, and “0” is used to identify data in a storage area corresponding to an address in the first target volume as invalid data. The valid data is defined as data stored in the storage area corresponding to the address changes. The invalid data is defined as data stored in the storage area corresponding to the address does not change.
After obtaining the original differential bitmap of the first target volume, the server re-calculates the original differential bitmap to generate a 4-bit bitmap. One bit occupies 4-MB storage space, and the re-generated bitmap is shown in FIG. 8.
The server sequentially obtains, from the virtualization platform in ascending order of addresses according to the re-calculated bitmap, volume backup data in locations corresponding to address segments with values of 1 in FIG. 8, divides the data in the first target volume into two fixed-length volume files, saves the two fixed-length volume files. In this case, a size of the backed-up data is 8 MB. The saved fixed-length volume files are shown in FIG. 9. F_1_Snap_2 identifies this file as a first fixed-length volume file in the second time of backup. The volume mapping file is shown in FIG. 10. The volume mapping file in FIG. 10 is formed by combining the volume backup file generated in the first time of backup and the volume mapping file generated in the second time of backup. F_1_Snap_2 identifies that that storage addresses of the first fixed-length volume file in the second time of backup are 0 to 3, and F_2_Snap_1 identifies that storage addresses of a second fixed-length volume file in the first time of backup are 4 to 7.
When data in the first target volume on the virtualization platform is lost, and volume data of the first target volume on the server in the second time of backup needs to be restored to the virtualization platform, the server searches, using backup software, for a volume mapping file that is of the first target volume and that is corresponding to a specific backup time point, obtains a name of each fixed-length volume file from the volume mapping file, opens a corresponding fixed-length volume file, reads all data in the fixed-length volume file, and transmits the data to the virtualization platform, so that the virtualization platform can save the received data to a specified location in the backup software.
However, in the backup and restoration method based on a fixed-length volume data file, because a granularity of one bit in the bitmap re-calculated by the server according to the original differential bitmap of the virtualization platform is 4 MB, when only some data in the fixed-length data block in the first target volume is valid data, the entire fixed-length data block needs to be backed up. For example, relative to the volume data shown in FIG. 1, in the volume data shown in FIG. 6, only data stored in an address 0 and an address 2 in a fixed-length data block corresponding to addresses 0 to 3 changes, and during backup, the fixed-length data block corresponding to the addresses 0 to 3 need to be backed up. Because data (invalid data) that does not change in the fixed-length data block is also backed up to the server during backup, a large amount of data is transmitted between the server and the virtualization platform. Consequently, backup time is increased.
In addition, when the first target volume on the virtualization platform is a thinly configured volume, for space that is not allocated on the virtualization platform, when data in the first target volume on the virtualization platform is backed up to the server, because the volume data is aligned according to a fixed length, invalid data is written into the server. When the data in the first target volume on the server is restored to the virtualization platform, data 0 (the data 0 occupies 1-MB storage space) in the fixed-length volume file is also written into the virtualization platform. Consequently, excessively much space on the virtualization platform is occupied. A feature of the thinly configured volume is storage space allowed to be used is set first, but storage space is allocated according to actual usage.