This invention relates generally to the backup of large data storage systems, and more particularly to optimizing incremental backup of data in enterprise data storage.
Businesses and other enterprises generate and use data for business and other enterprise operations. Frequently they generate and store large quantities of data which is essential to their operations, and any loss of data can be costly and have serious consequences. Thus, prudent businesses and enterprises have data backup and recovery systems for protecting against loss of data by periodically storing copies of the data.
There are different types of backup and recovery systems available. It is well known that repeatedly performing a full backup of data is inefficient, costly and consumes a large amount of storage space. Instead, many backup and recovery systems perform an incremental backup which only backs up and stores modifications or changes to the data since the last backup. Incremental backup systems can substantially reduce the amount of space necessary to store the backup data. However, they have some other undesirable aspects.
For instance, to identify the data which have changed since the last backup an incremental backup system may have to examine all of the data files by trawling the entire file system to determine which files have been changed and generate a list of new and modified files. For large data systems which may have millions of files, this can consume significant resources and time, and can degrade the system performance.
Since data storage systems conventionally maintain native summary change logs that record all changes to data files, some incremental backup systems use the summary change logs to identify files which have been changed, instead of trawling the entire file system and analyzing all the files. However, conventional change logs also have problems and can also degrade performance. This is because conventional change logs are both temporal and transactional in nature. They record all of the changes made to files in a file system in the change log sequentially in time order of the changes and at the various times the changes are made. A particular file may be created, changed a number of times, and then deleted during the time between successive backups. Thus, while a conventional summary change log records all of these various changes to the file, the change entries recorded in the change log will be made at the random times that the changes to the file occurred, and changes to a particular file will be interspersed temporally in the change log among all of the thousands or millions of other change entries to other files. The randomness of the entries in the summary change log also creates performance problems and inefficiencies. During backup, all of the entries in change log may have to be processed and sorted hierarchically (temporally) to determine that a particular file was ultimately deleted and, therefore, can be ignored in the backup. Accordingly, using the conventional summary change log for an incremental backup can also be time and resource consuming, inefficient and degrade performance.
It is desirable to provide backup approaches for the data storage systems of enterprises and businesses that address the foregoing performance, inefficiencies and other known problems of backing up such systems, and it is to these ends that the invention is directed.