1. Technical Field
The present invention relates generally to distributed, networks of computer systems, specifically computer clusters, and more particularly to systems and techniques to quiesce the entire data storage system of such a computer cluster for operations where this is useful or necessary, such as performing a cluster-wide data backup.
2. Description of the Related Art
Among the recent trends in data processing is the use of distributed collections of server computer systems to collect and process data for client computer systems. An entire such collection of computer systems is often termed a “cluster” and the clients and servers are spoken of as conducting “transactions” wherein the clients provide or request data from the servers and the servers store or process data for the clients. Many clusters today contain large numbers of client and server computer systems, wherein very large numbers of transactions take place.
As with virtually all computer systems, the data stored in cluster computer systems needs to be backed up from time to time. A key goal in performing a backup is to copy all of the data present in a manner so that the system being backed up can be restored exactly as it was at a particular time. However, this tends to be particularly difficult due to many factors.
For instance, a cluster contains many computer systems, yet the completeness of a backup is undermined if the data from even one computer system in the cluster is omitted. For this reason, a computer system crash or even a temporary unavailability for other reasons that prevents some data being backed up must be guarded against.
In addition to merely backup completeness, timing often plays a role. In many clusters, the computer systems conduct multiple asynchronous transactions concurrently, yet the clients and servers cannot be in mid-transaction when data is being backed up. At the clients, all transactions need to be either completed or forced to reinitiate later, after the backup. At the servers, all of the transactions also need to be completed, or flushed so that later reinitiated transactions from the clients are “seen” as new transactions. Furthermore, it is often highly desirable for many clusters to be kept available or “online” as much of the time as possible. Thus just bringing a cluster to a quiesce state, and then keeping it in that state only as long as necessary, are other factors that complicate performing a cluster-wide backup.
Not surprisingly, many efforts have been made to devise systems and processes for performing cluster-wide backups, but all of such to date have limitations and the field remains wanting for improved systems and process to perform cluster-wide backups.
U.S. patent Publication 2003-0188115-A1 teaches a method and computer apparatus capable of making a backup copy of data stored on a hard disk drive (HDD). A user places a personal computer (PC) with a typical (operating system) OS into a hibernation state by inputting a particular key sequence. A working state data is stored on the HDD just before the backup copy of data is created. During the backup process, another OS in a hidden partition of the HDD is booted so as to execute a program for making an exact copy of the HDD. When the exact copy processing is completed, the PC's operation is resumed and the main OS is booted to recover from the hibernation state back to the original state.
As such, the teachings of this '115 publication are limited to individual, PC computer systems. How to quiesce an entire cluster of multiple computer systems, particularly ones engaged in client-server transactions is not taught or reasonably suggested by this reference.
U.S. patent Publication 2003-0028736-A1 teaches a system and method for allowing applications to interact with a common backup program in a uniform way. A communication mechanism for one or more applications to exchange information with the backup program regarding components of the applications is provided. The information exchanged may include an identification of the components of each application. A component may be considered a group of files or resources that should be backed up or restored together. In this way, when a backup operation is initiated, each application may provide instructions to the common backup program describing the specific components to be backed up. In addition, each application may add other application-specific information useful during a restore of the backed up data.
As such, the teachings of this '736 publication are limited to individual applications advising a backup system what data components should be backed up. How even this can be performed across an entire cluster of multiple computer systems engaged in client-server transactions is not taught or reasonably suggested. This reference does teach that its form of limited backup can be performed on a system in a quiescent state, but how the system being backed up is put into such a state is left to the individual system.
U.S. Pat. No. 5,692,155 by Iskiyan et al. teaches a data storage system that atomically suspends multiple duplex pairs across either a single storage subsystem or multiple storage subsystems. The duplex pairs are suspended such that the data on the secondary direct access storage devices (DASDs) of the duplex pairs is maintained in a sequence consistent order. A host processor in the data storage system running an application generates records and record updates to be written to the primary DASDs of the duplex pairs. The storage controller directs copies of the records and record updates to the secondary DASDs of the duplex pairs. Sequence consistency is maintained on the secondary DASDs by quiescing the duplex pairs and then suspending the duplex pairs with change recording. Quiescing the duplex pairs allows any current write I/O in progress to complete to the primary DASD. The storage controller then locks out any subsequent write I/O from the host processor by raising a long busy signal to such subsequent write requests. Suspending the duplex pairs with change recording directs the storage controller to mark the physical address of the primary DASD which the application in the host processor updates between the time the duplex pair is suspended and then is reestablished.
As such, the teaches of the Iskiyan et al. patent are limited to dual copy scheme, wherein some paired systems are “rotated” into a quiesce state for backing up while other system carry on. While quite powerful, this approach requires additional hardware and does not teach and cannot be seen to be extendable to conventional cluster computer systems.
U.S. Pat. No. 5,339,397 by Eikill et al. teaches an information processing network that includes multiple processing devices, a main storage memory, one or more disk drives or other auxiliary storage devices, and an interface for coupling the processing devices to the main storage memory and the auxiliary devices. A primary directory in main storage contains mapping information for translating virtual addresses to real addresses in the main storage. Look-aside buffers in the processing devices duplicate some of the mapping information. A primary directory hardware lock, subject to exclusive control by any one of the processing devices able to update the primary directory, inhibits access to the primary directory based on hardware address translations initiated when one of the processors holds the primary directory lock. Address translations in progress when the lock is acquired proceed to completion before the primary directory is updated under the lock. Accordingly, such updates proceed atomically relative to hardware primary directory searches. Unnecessary quiesces and purges of the look-aside buffers are eliminated, improving network performance.
As such, the Eikill et al. patent teaches apparatus and processes for avoiding quiesces by keeping a limited set of memory online for transactions, but this is limited to specific contexts. The information processing network of Eikill et al. uses a main storage memory and auxiliary storage. The main storage memory is quiesced for backup purposes, while transactions carry on using the auxiliary storage. While quite powerful, this approach therefore requires additional memory. Also, Eikill et al. does not teach and it cannot be seen how this approach could be extended to a cluster of computer systems.
BHATTACHARYA et al. in “Coordinating Backup/Recovery and Data Consistency Between Database and File Systems”, ACM SIGMOD '2002, discuss how managing a combined store consisting of database data and file data in a robust and consistent manner is a challenge for database systems and content management systems. In such a hybrid system, images, videos, engineering drawings, etc. are stored as files on a file server while meta-data referencing/indexing such files is created and stored in a relational database to take advantage of efficient search capabilities. This paper describes solutions for two potentially problematic aspects of such a data management system:
backup/recovery and data consistency. Algorithms are presented for performing backup and recovery of the DBMS data in a coordinated fashion with the files on the file servers. This paper also proposes an efficient solution to the problem of maintaining consistency between the content of a file and the associated meta-data stored in the DBMS from a reader's point of view without holding long duration locks on meta-data tables. In the model, an object is directly accessed and edited in-place through normal file system APIs using a reference obtained via an SQL query on the database. To relate file modifications to meta-data updates, the user issues an update through the DBMS, and commits both file and meta-data updates together.
As such, this paper teaches an algorithmic approach for avoiding quiescing for backup purposes. Accordingly, much like the Iskiyan et al. and Eikill et al. patents, this paper teaches techniques that are useful in particular contexts but that cannot be seen to be extendable to conventional cluster computer systems.
Thus, the current systems and methods used to backup clustered computer systems remain highly inefficient. By in large, these existing approaches include quiescing one client at a time or one container (fileset) at a time, but do not keep the application state consistent throughout a data cluster of a computer system. The pre-existing backup systems interrupt client activity and any changes to the state of an application are generally lost during the backup procedure.
Otherwise, the emerging trend is to undertake substantial change to the computer systems themselves, adding storage units and control capability to manage all of the available storage so that some can quiesced for backup while other storage is employed. While these approaches have considerable merit in particular applications, their added cost and complexity are generally prohibitive.
It is, therefore, an object of the present invention to provide an efficient approach to quiesce the entire file system of a cluster of computer systems. Preferably, such an approach should also quiesce with a single command, in an atomic manner. Other objects and advantages will become apparent from the following disclosure.