1. Field of the Invention
The present invention relates to computer systems and, more specifically, to a system and method for providing multi-volume online data backup.
2. Background Art
Typically, an operating system of a computer system includes a file system to provide users with an interface while working with data on the computer system's disk and to provide the shared use of files by several users and processes. Generally, the term “file system” encompasses the totality of all files on the disk and the sets of data structures used to manage files, such as, for example, file directories, file descriptors, free and used disk space allocation tables, and the like. File systems may also enhance system performance with additional functions such as, for example, caching, access markers and fault-tolerance.
Generally, a file system architecture that provides a recoverable file system is preferable to conventional file systems that lack this feature. In conventional systems, “careful” write and “lazy” write are the two main approaches to implementing input-output support and caching in file systems. Typically, a careful write is implemented in file systems developed for VAX/VMS and other similar closed operating systems. A lazy write is generally implemented in the HPFS (High Performance File System) of the OS/2 operating system and in most UNIX file systems.
In the event of an operating system failure or power supply interruption, for example, input-output operations performed at that time are immediately interrupted. Depending on what operations were performed and how far the execution of these operations had advanced, such interruption may affect the integrity of the file system.
When a file system of any type receives a request for renewal of disk content, the file system must perform several sub-operations before the renewal can be completed. In file systems using the strategy of careful write, these sub-operations always write their data onto the disk.
A file system utilizing the careful write policy generally sacrifices its performance for reliability. On the other hand, a file system with lazy write typically increases performance due to the strategy of write-back caching. Writeback caching is a caching method in which modifications to data in the cache aren't copied to the cache source until absolutely necessary. This method of caching using the lazy write policy provides several advantages over the careful write policy, which accordingly increases system performance.
Recoverable file systems, such as, for example, Microsoft NTFS (Windows NT File System), may provide greater reliability than file systems with careful write, but also provide the performance of file systems with lazy write.
The high reliability of the recoverable file system has its disadvantages. For each transaction that modifies the volume structure, the file system must enter one record into the journal file for each transaction sub-operation. The integration of journal file records into packets may increase the efficiency of the file system: for each input-output operation, several records may be simultaneously added to the journal. Moreover, the recoverable file system may use optimization algorithms, such as those used by file systems utilizing lazy write. The file system may also increase the intervals between writing the cache contents to the disk, because the file system can be recovered if a failure occurs before the modifications are copied from cache to the disk. The utilization of these tactics to improve performance generally compensates for and may even exceed the performance losses incurred by protocolling the transactions.
But, neither careful write nor lazy write can guarantee protection of user data. If a system failure occurs at the moment an application writes to a file, then the file may be lost or destroyed. Moreover, in the case of a lazy write policy, the failure may damage the file system because the lazy write policy may have destroyed existing files or even made all information on the volume unavailable.
In contrast, recoverable file systems, such as, for example, Windows NTFS, possesses greater reliability in comparison with traditional file systems.
The development of file systems demonstrates that fault-tolerance and recoverability of file systems after failures are important design considerations. To provide maximum reliability, it is necessary to periodically copy all files as an immediate copy or cast of the file system, e.g., a snapshot. By its functionality, a snapshot is very similar to the journal of a recoverable file system, as they can both restore the system to the integral state. A snapshot guarantees full data recovery, but incurs high expenses in creation and storage.
Snapshot creation generally involves sector by sector copying of the whole file system, i.e., service information and data. If the file system is currently active, then files may be modified during copying—some files can be open for writing or locked, for example. In the simplest case, the file system can be suspended for some time and during that time a snapshot is recorded. Of course, such an approach cannot be applied to servers where uninterruptible activity of the file system is necessary.
In most cases the archiving of one disk or one partition is insufficient. Data for one program (for example, Microsoft SQL Server) can be located on several hard disk drives (HDD) or on several partitions of one or more hard disk drives. In this case, it is necessary to stop the SQL Server, so that data of SQL Server will not change on one of these partitions during backup. If SQL Server's writes are not be stopped or suspended then data on one backed-up partition will not correspond to data on the other partitions. Data on a second partition can changes during the backup process of a first partition, and backed-up data from both partitions will not be synchronized.
One of solutions to this problem is stopping/suspending SQL Server service, which controls write operations to the partitions, to create a backup of the volumes. But, in this case, the SQL Server will not save the information on the volumes until the end of the backup process (i.e., the SQL Server is off-line during the backup process). In most cases, a lot of time, sometimes tens of minutes is needed for the backup process.
For example, one common problem in the context of backing up large amounts of data relates to database backups, particularly where the databases are large and distributed across several physical drives, or several partitions, or several volumes. Thus, the conventional approach would be to freeze the entire database (or at least block attempts to write to the database, but possibly allowing reads to the database), and use the now frozen database for backups. This is done because if a backup is done of one of the volumes, while the user application that utilizes the database is writing to a different volume, portions of the backup will be out of sync with the original copy of the database. Therefore, since such a situation is not acceptable, the entire database is often frozen, and then backed-up. The problem is that during that time, the database is not available, and for large enough databases, the time involved can be considerable.
Accordingly, due to the disadvantages associated with conventional data backup systems, there is a need for a multi-volume data backup process that is both reliable and efficient not just for one partition of the data storage device, but for two and more partitions of one and more data storage device. Moreover, there is a need for an online data backup process that allows a computer system to remain online while data on multiple volumes is being backed-up and also addresses the disadvantages associated with conventional back-up systems.