One of the many benefits of computers and computing systems is their ability to process data and to make the data useful and readily available. People want immediate access to their email, for example, and email providers implement computing systems with sufficient processing power to handle email related processing. Data is important in other contexts as well. Businesses rely on readily available data to manage product and inventories. By way of example, businesses use data, for example, to set prices, sell tickets, or manage schedules. When data cannot be accessed, there is a corresponding cost.
The inability to access data in the short term is often annoying and inconvenient. The complete loss of data, however, can have serious consequences. As a result, it is advisable to backup data. Most businesses and enterprises today have an active backup application that is protecting their data. Individual users are also beginning to protect their data.
There are different types of backup systems in use today. It has long been recognized that repeatedly performing a full backup of data can consume significant space—especially when the backups are retained over time. In an incremental backup system, for instance, the amount of data backed up is reduced because an incremental backup only backs up modifications or changes that have been made to the data since the last backup.
While this approach can minimize the amount of data that is backed up at a given time, incremental backups also have undesirable features. For example, identifying which data (e.g., which files) have changed since the last backup may require that all of the files be examined to analyze the modification time stamps. For larger systems, which may have millions of files, this can become a time consuming process and can degrade the computing performance.
More generally, conventional backup applications that support incremental backups trawl the entire file system to generate a list of modified files. This can consume significant resources as previously stated.
Instead of trawling the entire file system, some backup applications may take advantage of the file system's native change log. However, using the native change log can also result in degraded performance. This is partly related to the fact that conventional change logs are transactional in nature. Every change that occurs to a file is recorded in a conventional change log.
For example, a change log may record that a particular file is created, change a large number of times, and then deleted. Because a transactional change log records transactions for all files, the transactions associated with a particular file will be interspersed with changes to other files. During backup, all of these changes need to be processed even though the file is ultimately deleted. Consequently, the activity associated with performing a backup based on a transactional change log can also degrade the performance of the file system. More specifically, the backup operation cannot view the entire history of a file. Thus, the backup operation may encounter a new file (that was subsequently deleted) and attempt to backup that file. This results in an unnecessary disk access in part because the backup operation is only processing the create record and is unaware of the delete record.
Conventional change logs can also be very large. As previously stated, file changes are kept in a temporal order. As a result, there is a need to purge the change log at regular intervals or wrap the change log to ensure that its size does not become too large. If a situation arises where the backup application does not process the change log before it is purged or wrapped around due to size constraints, a full backup will need to be performed to ensure backup integrity. This can adversely affect the backup frequency. A system with a high change rate may require backups at frequent intervals due to concerns with the change log's size.
Another issue with conventional change logs in the context of backup operation relates to the inability to backup all modified regions of a file by reading a single record. For example, consider a situation where a file system is generating change events frequently and is creating and removing temporary files frequently. When such a change log is processed, the backup process proceeds to backup the temporary file based on the file listed in the file create event. If a path lookup is performed, it is discovered that the file is missing since the file was later removed. Unfortunately, the change log contains the file remove record at a later offset in the change log.
Because the create record and the remove record could not be obtained at the same time, the backup application had to issue a file lookup event, which can result in several physical input/output (I/O) operations to serve the file change request. This degrades the performance of the backup operation and of the physical file system. There is a need in the art for systems and methods for performing backup operations in a manner that does not degrade the performance of the backup operation or of the physical file system.