1. Field of the Invention
The invention relates to a method for enabling incremental backup on a computer system with an on-line file system having an archive bit attribute associated with each file.
2. Description of the Related Art
Computer systems often perform incremental data backups on computer files to enable recovery of lost data. To maintain the integrity of the backed-up data, the backup process must accurately back up all files modified after the most recent backup process. In order to identify modified files, computer file systems using a DOS or a Windows NT operating system have an archive bit in each file in a directory structure. After an application modifies a file, the operating system sets the value of the archive bit to indicate that the file needs to be archived. When a backup program starts, it searches for all files in the directory structure with the archive bit set. The backup program copies each such file to a secondary storage device. Then it verifies the backed up files and it clears the archive bit on each file after a complete backup of that file.
A problem with using only the archive bit to flag modified files is, if a previously modified file is opened and subsequently modified during the backup operation, the archive bit associated with the file will not reflect the correct value. Specifically, when the backup program clears the archive bit after a complete file backup, the archive bit value will incorrectly reflect the subsequent changes made during the backup process. If no additional modifications to this file occur before the next backup operation, the archive bit value will remain incorrect and the file will not be selected for backup in the next backup execution. This lapse in backing up the file modification could result in the loss of important data should the system crash before the archive bit value is corrected and the file is backed up.
To avoid this problem, some backup programs do not back up open files but strictly operate on files in a "read-only" state. If a file is on-line and in a "read-write" state, the backup programs skip that file. This approach raises the problem of on-line modification made during the backup process. If the system crashes before the next scheduled backup, the data owner loses all modifications made before, during, and after the previous backup. Some network administrators solve this problem by forcing all users to log off the system before backup execution begins. Others solve this problem by shutting down the system, pre-scanning all files in the directory structure, recording all the files that need to be archived, and thereafter clearing all archive bits in the directory structure. The pre-scanning and recording time in the second method increases as the files in the directory structure increase. While these attempts ensure that all files are closed and in the "read-only" state, it reduces user productivity. For businesses that require high availability, i.e., twenty-four hours/seven days a week on-line activity, these solutions are unacceptable.
A solution to the productivity issue involves periodically performing a copy-on-write procedure on each on-line storage device whereby, the system takes a "snapshot" or copy of the data from the on-line storage devices at a particular instant in time. On-line storage devices are configured from on one or more disks into logical units of storage space referred to herein as "containers". The copy-on-write procedure is described in a co-pending U.S. patent application Ser. No. 08/963,754 entitled System and Method for Real-Time Data Backup Using Snapshot Copying with Selective Compaction of Backup Data, which application hereby incorporated by reference as though fully set forth herein.
During the copy-on-write procedure, a read-only snapshot container is also created and the snapshot is stored in the snapshot container. In addition to the contents of the file, the snapshot container contains attributes of the file such as the value of the archive bit. The file contents in the snapshot container are preferably an identical copy of the file contents in the read-write on-line container at the instant the snapshot was taken. This enables users to work on the files in the read-write on-line container while the backup program backs up files from the snapshot container. During the backup operation, the backup program backs up the files from the snapshot container and clears the archive bits in the files in the snapshot container to indicate that the files have been backed up. After a complete backup operation, the system deletes the snapshot container. However, the archive bits in the associated files in the read-write on-line container remain set and do not reflect the clear archive bit operation performed by the backup process. This may eventually lead to a situation where all files in the read-write on-line container have their archive bits set; thus the system performs a full system backup during every backup operation. The copy-on-write procedure resolves the productivity issue but it does not resolve the problem of an archive bit that inaccurately reflects the state of a file and it does not resolve the problem with on-line file modification occurring during the backup process.
One attempt to solve the on-line modification problem involves performing a full system backup every time the files are backed up. A full system backup ensures that every file on the system is copied to secondary storage. However, such operations substantially increase the time and storage resources needed to perform backups. Moreover, users may be unable to access the files during a full system backup. Accordingly, the cost of performing such backups is greater in terms of user productivity and/or system resources.
To allow data recovery Window NT File System(NTFS) was designed as a recoverable file system. In case of an emergency, NTFS automatically reconstructs disks containers the first time a disk is accessed. NTFS then returns the data to a consistent state. NTFS also uses redundant storage for its vital sectors, so that if one location on the disk is bad, the file system can still access the container's critical file system data. The NTFS recovery capabilities ensure that the file system on a container remains accessible but there is no guarantee of complete recovery of user files.
For businesses that cannot afford to lose user data, Windows NT also allows users the ability to "plug in" fault tolerant disks storage. Fault tolerant drivers in the operating system "mirror" or duplicate data from on-line disks on to the fault tolerant storage devices to ensure that a redundant copy can always be recovered. While this further ensures that users can recover their data, it is more expensive than backing up files using the copy-on-write approach. NTFS mirror recovery requires the system administrator to always keep a duplicate copy of the data, whereas the copy-on-write approach allows the system administrator the flexibility to determine when to create the snapshot. After the backup execution completes, the system administrator discards the snapshot container. Thus, the system administrator in a copy-on-write situation only has to maintain the duplicate copy on the snapshot container for a limited time.
As noted, the problem with the current backup processes is that on-line files either cannot be backed up at all or cannot be backed up reliably. Therefore, it is an object of the present invention to provide a process that allows on-line files to be backed up reliably while maintaining the integrity of the backed up data and reflecting the true state of the archive bit.
Another object of the invention is to allow users to continue to work on files during backup operations.
Yet another object of the invention is to obviate the need for a full backup every time a backup operation is performed.
Still yet another object of the present invention is to provide an on-line file backup system that does not require specialized drivers and specialized disks storage.