Banks, insurance companies, brokerage firms, financial service providers, and a variety of other businesses rely on client-server computer networks to store, manipulate, and display information that is constantly subject to change. A significant amount of the information stored as digital data in computer networks is mission-critical. For instance, the success or failure of an important transaction may turn on the availability of information which is both accurate and current. In certain cases the credibility of the service provider, or its very existence, depends on the reliability of the information displayed by the network.
Accordingly, many financial firms world-wide recognize the commercial value of their data and are seeking reliable, cost-effective ways to protect the information stored on their client-server computer networks. In the United States, federal banking regulations also require that banks take steps to protect critical data.
Mission-critical digital data may be threatened by natural disasters, by acts of terrorism, or by more mundane events such as computer hardware failures. Although these threats differ in many respects, they are all limited in their geographic extent. Thus, many approaches to protecting data involve creating a copy of the data and placing that copy at a safe geographic distance from the original source of the data. As explained below, geographic separation is an important part of data protection, but does not alone suffice for many network users.
The distance which is deemed safe depends on the expected threats to the data. Storing a copy of the data in the same room with the original data typically provides some protection against hardware failures; storing data in another room in the same building or in a building across the street may provide the copy of the data with sufficient protection against destruction by a fire that destroys the storage medium holding the original data. Acts of terrorism, earthquakes, and floods require greater separation. In some cases, separations of 30 miles or more are required.
In the mainframe computer environment a process known as "remote journaling" or "electronic vaulting" is used to protect data. A mainframe at the original data site is connected by a communications link to a remote mainframe which is located at a safe distance from the original site. Data written to the original mainframe's disks is also sent at essentially the same time to the communications link, and hence to the remote mainframe, where it is stored until needed. Mainframe electronic vaulting thus suggests that, in addition to geographically separating a copy of critical data, data protection for client-server networks should also include some way of updating the geographically separate copy of the data.
Although electronic vaulting provides an abstract model for the protection of client-server computer network data, the extreme differences between mainframe environments and client-server network environments prevent any substantial use of electronic vaulting hardware or software in such networks. At the hardware level, mainframe connectors, bus protocols, and signals are all typically incompatible with those of client-server computer networks. Hardware which connects a mainframe to a communications link will typically not even plug into a networked workstation or personal computer, much less function properly to permit communication.
At the software level, electronic vaulting code may be embedded within the mainframe's operating system, making it difficult or impossible to port the vaulting code to a client-server network environment. Even when the electronic vaulting software is not embedded within the mainframe's operating system, the interface between the proprietary mainframe operating system and the electronic vaulting software generally involves disk accesses, context switching, and other critical low-level operations. Such low-level software is generally very difficult to port to a network, which uses a very different operating system.
In addition, mainframes, unlike networks, do not typically face the prospect of coordinating the activities of numerous users, each of whom is controlling a separate machine having its own local operating system and central processing unit. Thus, mainframe software typically assumes "sole ownership" of files and other system resources. Such assumptions, which may permeate the electronic vaulting code, do not hold in a network.
A different approach to copying data, which is used both with mainframes and with networks, is off-site tape storage. Critical data is copied onto magnetic tapes at the end of each business day. These backup tapes are then taken by truck or plane to a storage site some distance from the original data. Thus, if a disaster of sufficiently limited geographic scope destroys data at the original site, the tapes kept at the storage site may be used to recover important information.
Although off-site tape storage is relatively simple and inexpensive, it has severe limitations. Most importantly, the data on the tapes is only as current as the most recent backup. Thus, assume a business's backup finished at 1:00 AM, the business opened at 8:00 AM, and a disaster occurred at 3:00 PM. Then the business activity in the seven hours from 8:00 AM to 3:00 PM is lost, because it was not stored on the tape. It may be difficult or impossible to accurately reconstruct every transaction that occurred during the lost period. Persuading everyone involved that the reconstruction is accurate may also present problems. In short, merely creating a geographically separate copy of data does not provide adequate protection. The remote copy must also be substantially current.
A continuing disadvantage of off-site tape storage is the time required to create the tape backup. To ensure the integrity of data being stored on the tape, only the backup software typically has access to the network during the backup procedure. If a business closes at the end of each day and leaves its computer network essentially unused at night, the opportunity costs of restricting access during the backup procedure are negligible. However, an increasing number of computer networks are used by businesses that operate world-wide, and hence these networks are needed 24 hours a day, 7 days a week. Shutting down such networks for several hours each day to make a tape backup may have a significant adverse effect on the business.
In addition, hours or days may be needed to restore data from the backup tapes onto hard drives or other immediately useable media. The computer network's performance may be reduced while data is being restored. Indeed, in some instances it is necessary to deny all other users access to the network while data is being restored, in order to ensure the integrity of the data after the restoration.
Another approach to copying data stored on computer networks is known as "data shadowing." A data shadowing program cycles through all the files in a computer network, or through a selected set of critical files, and checks the timestamp of each file. If data has been written to the file since the last time the shadowing program checked the file's status, then a copy of the file is sent over a communications link to another program which is running on a remote computer. The remote program receives the data and stores it at the remote site on tapes or other media. As with off-site tape storage, hours or days may be required to restore shadowed data to a useable form at the original site.
Shadowed data is typically more current than data restored from an off-site tape backup, because at least some information is stored during business hours. However, the shadowed data may nonetheless be outdated and incorrect. For instance, it is not unusual to make a data shadowing program responsible for shadowing changes in any of several thousand files. Nor is it unusual for file activity to occur in bursts, with heavy activity in one or two files for a short time, followed by a burst of activity in another few files, and so on. Thus, a data shadowing program may spend much of its time checking the status of numerous inactive files while a few other files undergo rapid changes. Mission-critical data may be lost because the shadowing program is driven by the list of files and their timestamps rather than directly by file activity.
Many conventional attempts to protect data also share another problem, namely, that open files are not copied. The contents of files which have been "opened" for access by users may change during tape backup, data shadowing, or other procedures that create a copy of the file contents. These changes may lead to internal inconsistencies and lost data because a copy program (e.g., a tape backup or data shadowing program) sees one part of the file before the change and another part of the file after the change.
For instance, suppose that an open file has a length of 10,000 bytes and this length is recorded in the first block of the file. Critical data will be lost if events occur as follows: (1) the copy program notes that the file is 10,000 bytes long; (2) an additional 5,000 bytes of critical new data is added to the end of the open file by the user; and (3) at some later time, the original copy of the file-including the 5,000 new bytes--is destroyed.
The copy program will only have copied the first 10,000 bytes of the file. The additional 5,000 bytes will be lost even if the program had plenty of time to copy that data as well, because the copy program doesn't "know" that the additional data is there until it works its way back to the file in question. Depending on the copy program, the number of files involved, and other factors, minutes or even hours may pass before the program returns to the file in question and notes that additional data needs to be copied.
Accordingly, client-server computer network operating systems typically restrict access to open files, and conventional data copying methods generally do not create copies of open files even when permitted to do so by the network operating system. However, the failure to copy open files also has severe drawbacks. Files may be left open longer than necessary, so that their mission-critical contents are actually stable enough to copy but are nevertheless not copied simply because the file is open. Thus, data may be lost even though it could have been copied to a safe location, merely because a file was left open longer than necessary.
In addition, failure to copy even a single open file in a relational database may lead to the loss of data in many files because such databases depend on files that are interrelated and sequential. For instance, suppose a database must search in sequential order through files A, B, C, and D to obtain the required information. Suppose file C was open during the backup and therefore was not copied. The data restored after a disaster may therefore include copies of files A, B, and D which are more current than the most recent available copy of file C. Such inconsistencies may corrupt the database and render useless the information in all four files.
Perhaps the most common solution to the open file problem is to perform backup procedures at the end of the work day. All users except the backup software are logged off the system, and all files are closed. Thus, a current, complete, and consistent copy of the critical data is obtained. However, this approach to dealing with open files has many of the drawbacks of off-site tape storage. Many networks, such as those used by hotel and airline reservation systems, credit authorization services, and global trading position databases, are in use non-stop. Moreover, critical data added after the backup software finishes is not protected until the next backup or shadowing file copy, which may be minutes or hours later.
Thus, it would be an advancement in the art to provide a system and method for effectively protecting mission-critical data in a client-server computer network.
It would also be an advancement to provide such a system and method which maintain a substantially current copy of critical network data.
It would be a further advancement to provide such a system and method which maintain a substantially current copy of data as that data is committed for storage in open files on disk.
It would also be an advancement to provide such a system and method which do not require limiting or denying access by other users while a copy of the critical data is created.
In addition, it would be an advancement in the art to provide such a system and method which permit storage of the data copy at distances up to 30 miles or more from the original data.
It would also be an advancement to provide such a system and method which make the copied data useable immediately after a disaster.
Such a system and method are disclosed and claimed herein.