In standard computer systems, the file system (e.g., NTFS on Windows or ext3 for Linux) is responsible for keeping track of the tree or hierarchy of files. It also stores files in fixed-size blocks on the disk and keeps track of where these blocks are located. Backup applications that read files using the file system to access data are inherently slow. Block based backups (BBB) bypass files and file systems by reading directly from the disk or volume, thus they incur no performance penalty for even large numbers of files because the backup application reads blocks in their order on the disk, not the order that they appear in files. Block based backups also support point-in-time snapshots in which a backup is started by first taking a snapshot of the live running volume. They then read block level data from the snapshot not the actual disk. In general, block-based backups are many times faster for backup and restore operations, as compared to traditional file system based backup systems. The performance increase is due (at least in part) by the fact that incremental backups are created using Changed Block Tracking (CBT), the backups are image based and there is no walking of the file system, plus no indexing is required. During recovery, the file system is virtually mounted, making the recovery very fast and efficient.
The advent of virtualization technology has led to the increased use of virtual machines as data storage targets. Virtual machine (VM) disaster recovery systems using hypervisor platforms, such as vSphere from VMware or Hyper-V from Microsoft, among others, have been developed to provide recovery from multiple disaster scenarios including total site loss. One popular backup system, such as the EMC Networker Block Based Solution creates a backup image in VHDx containers. VHDx is a Hyper-V virtual hard disk (VHD) format found in Windows servers; and has a present storage capacity of 64 TB compared to standard VHD storage limits of 2 TB. A container is an image file that stores backups.
With respect to block based backups and virtualization, full backups contain used blocks of the volume in the VHDx container. Incremental backups contain changed blocks embedded in the VHDx container. To obtain the changed blocks for incremental backups, systems use a CBT driver that monitors all the disks to see if any block is updated. If the block is updated, it will note that block number and block offset. When the user triggers an incremental backup, the Networker backup system consults the driver to obtain the changed blocks and backups only these changed blocks. The system does not backup the file indexes to the device; instead, when a file recovery is triggered it mounts and allows the user to select files. Without mounting the files, the user will not be able to tell which files were backed up and which were not. This is can be a significant issue for system performance, as data searches require remounting the file system.
Suppose, for example, that a user wants to search for a file/folder across all the storage nodes that have backup of all the clients pertaining to a particular department. Current block based backup systems do not allow this as they do not backup file indexes. The only way to perform this operation is to mount all the savesets to make the search. This is essentially a brute-force method that requires a great deal of time and overhead expense. One prior solution tries to identify all the changed files from changed blocks for incremental backups, but this requires the source machine that is to be backed up to be mounted on the proxy machine. This solution also applies for virtual machines hosted by VMware, but requires the use of a designated proxy that acts as Networker client.
What is needed, therefore, is a block based backup system that backs up file indexes and so eliminates the need to traverse and mount all the data savesets to allow the user to perform a search operation.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Networker, Data Domain, Data Domain Restorer, and Data Domain Boost are trademarks of EMC Corporation.