Information in a computer system is stored in data files. To ensure that this information isn't lost, these data files are typically backed-up to an independent data store, from which the information can be retrieved if the original data file is lost or corrupted. However, traditional backup systems do not back up a data file that is open and locked by another application at the time that the backup application is ready to process the given data file. Files such as this include certain e-mail and database files. These files are kept open for exclusive access by the “owning” application(s), because the owning application makes frequent changes to such data files, and the locked status prevents other applications from modifying the file content.
This system, although beneficial in many respects, nevertheless presents several problems for a backup application. First, because the files are locked for exclusive access, standard backup applications cannot access the data within the file to make a backup copy. Second, because the files change frequently and are generally too large to copy in a single read operation, portions of the file might change during the time it takes to make a copy using multiple read operations. This can result in an inconsistent, and useless, copy of the file. Third, many of these applications make use of transactions, or in other words, multiple changes to the file that are executed in a single process. Making a copy of a file at an arbitrary point in time that occurs in the middle of such a transaction would produce an inconsistent, and useless, copy of the file.
There are currently three approaches for backing up open files. The first approach is to have application-dependent backup programs. These programs generally request that the application temporarily stop writing to the file and release the exclusive access lock on it, so that the file can be copied by the backup program. However, these programs only work with a specific application or set of applications for which they were designed.
The second approach is to have a file system process that intercepts all data written to an open file and creates a journal file containing a chronological log of all data changes. Alternately, the application itself, such as a database server, may create a journal file separate from the data file as part of its normal operation. A backup application periodically (or upon change) reads the journal file and applies the changes to a separate backup copy of the open file. This requires the application or a file system process to perform an additional write operation for each data change, which might impact the performance of the application. Also, an initial baseline copy of the file is required, which must be created while the file is closed.
The third approach is to perform periodic volume snapshots. A volume snapshot captures the entire contents of a volume, such as all files and directories stored on the volume, and includes any changes since the last snapshot. The snapshot, which is static, can then be copied to create a backup of the entire volume. However, the volume snapshot is not synchronized with the write patterns of any particular application and can happen in the middle of a transaction.
What is needed, therefore, is a system that overcomes problems such as those described above, at least in part.