1. The Field of the Invention
The present invention relates to the protection of computer data, and more particularly to a system and method for mirroring and archiving data of one mass storage to another mass storage.
2. The Prior State of the Art
There is little question that computers have radically changed the way that businesses collect, manage, and utilize information. Computers have become an integral part of most business operations, and in some instances have become such an integral part of a business that when the computers cease to function, business operations cannot be conducted. Banks, insurance companies, brokerage firms, financial service providers, and a variety of other businesses rely on computer networks to store, manipulate, and display information that is constantly subject to change. The success or failure of an important transaction may turn on the availability of information which is both accurate and current. In certain cases, the credibility of the service provider, or its very existence, depends on the reliability of the information maintained on a computer network. Accordingly, businesses worldwide recognize the commercial value of their data and are seeking reliable, cost-effective ways to protect the information stored on their computer networks. In the United States, federal banking regulations also require that banks take steps to protect critical data.
Critical data may be threatened by natural disasters, by acts of terrorism, or by more mundane events such as computer hardware and/or software failures. Although these threats differ in many respects, they all tend to be limited in their geographic extent. Thus, many approaches to protecting data involve creating a copy of the data and placing that copy at a safe geographic distance from the original source of the data. Geographic separation may be an important part of data protection, but does not alone suffice to fully protect all data.
Often the process of creating a copy of the data is referred to as backing up the data or creating a backup copy of the data. When creating a backup copy of data stored on a computer or a computer network, several important factors must be considered. First, a backup copy of data must be logically consistent. A logically consistent backup copy contains no logical inconsistencies, such as data files that are corrupt or terminated improperly. Second, a backup copy of data must be current enough to avoid data staleness. The time between backups, which largely determines the staleness of the backup copy, must be sufficiently short so the data on the backup is still useful should it be needed. For certain applications, such as networks that store financial transactions, backups a week old may be useless and much more frequent backups are needed. How frequent backup copies can be made is a function of many factors such as whether the backup can be made during normal business operations, the time it takes to make a backup copy, and so forth.
In order to create a backup copy of the data, several approaches have been taken. Each of the approaches has certain advantages and disadvantages. Perhaps the simplest approach to creating a backup copy of critical data is to copy the critical data from a mass storage system, such as the magnetic storage system utilized by a computer network, to a second archival mass storage device. The second archival mass storage device is often a storage device designed to store large amounts of data at the expense of immediate access to the data. One type of archival storage commonly used is magnetic tape. In these backup systems, data is copied from the mass storage system to one or more magnetic tapes. The magnetic tapes are then stored either locally or at a remote site in case problems arise with the main mass storage system. If problems arise with the mass main storage system, then data may be copied from the magnetic tape back to either the same or a different mass storage system.
Although using magnetic tape or other archival storage as a means to guard against data loss has the advantage of being relatively simple and inexpensive, it also has severe limitations. One such limitation is related to how such backups are created. When data is copied from a mass storage system to a backup tape, the copy process generally copies the data one file at a time. In other words, a file is copied from the mass storage system onto the tape. After the copy is complete, another file is copied from the mass storage system to the tape. The process is repeated until all files have been copied.
In order to ensure the integrity of data being stored on the tape, care must be taken to keep the file from changing while the backup is being made. A simple example will illustrate this point. Suppose a file stores the account balances of all banking customers. If the account balances were allowed to change during the time the file is being backed up, it may be possible to leave a file in a logically inconsistent state. For example, if one account balance was backed up, and immediately after the account was backed up the account balance was debited $100.00, and if that same $100.00 was credited to a second account, then a situation may arise where the same $100.00 is credited to two different accounts.
In order to prevent such a situation from occurring, the data in a file must not change while the backup copy is made. A simple way to prevent data from changing is to prevent all access to the file during the backup procedure. In such a scheme, access to the files is cut off while the file is backed up. This approach is used by many networks where access to the mass storage system can be terminated after the close of business. For example, if a business closes at the end of each day and leaves its computer network essentially unused at night, user access to the network can be terminated at night and that time used to perform a backup operation. This, however, limits creation of a backup copy to once per day at off hours and therefore may be insufficient for some operations.
An increasing number of computer networks are used by computer businesses that operate world wide, and hence these networks may be needed twenty-four hours a day, seven days a week. Shutting down such a network for several hours each day to make a tape backup may have a significant adverse affect on the business. For such businesses, creating a backup tape in the traditional manner is simply impractical and unworkable.
In an attempt to accommodate such operations or to increase the frequency of backups, an approach to copying data stored on computer networks known as xe2x80x9cdata shadowingxe2x80x9d is sometimes used. A data shadowing program cycles through all the files in a computer network, or through a selected set of critical files and checks the time stamp of each file. If data has been written to the file since the last time the shadowing program checked the file""s status, then a copy of the file is sent to a backup system. The backup system receives the data and stores it on tapes or other media. The shadow data is typically more current than data restored from a tape backup, because at least some information is stored during business hours. However, shadow data may nonetheless be outdated and incorrect. For example, it is not unusual to make a data shadowing program responsible for occur in bursts, with heavy activity in one or two files for a short time, followed by a burst of activity in several other files. Thus, a data shadowing program may spend much of its time checking the status of numerous inactive files while several other files undergo rapid changes. If the system crashes, or becomes otherwise unavailable before the data shadowing program gets around to checking the critical files, data may be lost.
Another problem with data shadowing programs is that they typically do not work for data kept in very large files. Consider a system with a single very large database and several much smaller data files. Assuming that a business""s primary information is stored in the large database, it is reasonable to expect that a large percentage of the business day will be spent reading and writing data to the very large database. Assuming that a backup copy could be made of the very large database, the time needed to make a backup copy of such a large database may make the use of data shadowing impractical. The data shadowing program may attempt to make copy after copy of the large database. Making such numerous copies not only takes a tremendous amount of time, but also requires a tremendous amount of backup storage space.
Another problem of data shadowing type systems is that open files are generally not copied. As previously described, a file must be frozen while a backup copy is made in order to prevent changes to the file during the backup process. Thus, data shadowing systems usually do not attempt to make copies of open files. If changes are constantly being made to a large database, the large database will constantly be open and data shadowing systems may not copy the database simply because the file is open. For at least these reasons, data shadowing systems are typically not recommended for very large data files.
Another approach that has been attempted in order to overcome some of these limitations is a process whereby a time sequence of data is captured and saved. For example, many systems incorporate disk mirroring or duplexing. In disk mirroring or duplexing, changes made to a primary mass storage system are sent to other backup or secondary mass storage systems. In other words, when a data block is written to the primary mass storage system, the same data block is written to a separate secondary mass storage system. By copying each write operation to a second mass storage system, two mass storage systems may be kept synchronized so that they are virtually identical at the same instant in time. Such a scheme protects against certain types of failures, but remains vulnerable to other types of failures.
The primary type of failure that disk mirroring overcomes is a hardware failure. For example, if data is written to two disks simultaneously, then if one disk fails, the data is still available on the other disk. If the two disks are connected to two separate disk controller cards, then if a single disk controller card or a single disk fails, then the data is still accessible through the other disk controller card and disk assembly. Such a concept can be extended to include entire systems where a secondary network server mirrors a primary server so that if a failure occurs in the primary network server, the secondary network server can take over and continue operation. The Novell(copyright) SFT line of products use variants of this technology.
While such systems provide high reliability against hardware failures and also provide almost instantaneous access to backup copies of critical data, they do not guard against software failures. As software becomes more and more complex the likelihood of software failures increase. In today""s complex computing environments where multiple computer systems running multiple operating systems are connected together in a network environment, the likelihood of software errors causing occasional system crashes increases. When such a software error occurs, both the primary mass storage system and the mirrored mass storage system may be left in a logically inconsistent state. For example, suppose that a software error occurred during a database update. In such a situation, both the primary mass storage system and the mirrored mass storage system would have received the same write command. If the software error occurred while issuing the write command, both mass storage systems may be left in an identical, logically inconsistent state. If the mirrored mass storage system was the only form of backup in the network, critical data could be permanently lost.
If a backup is to be made at a remote location, the problems with the above technology are exacerbated. For example, if disk mirroring is to be made to a remote site, the amount of data transferred to the remote site can be considerable. Thus, a high speed communication link must exist between the primary site and the secondary or backup site. High speed communication links are typically expensive. Furthermore, if a time sequence of data is to be sent to a secondary system at a remote location over a communication link, then the reliability of the communication link becomes a significant issue. If for any reason the communication link should be temporarily severed, synchronization between the primary mass storage system and the secondary or backup mass storage system would be lost. Steps must then be taken to reconcile the two mass storage devices once the communication link is reestablished. Thus, mirroring a primary mass storage system at a remote site is typically difficult and very expensive.
The problems of mirroring a single system to a remote site becomes even more complicated when a single remote site is to service several primary systems. Since a remote disk mirror typically requires a dedicated communication link, the secondary system must be sufficiently fast to handle communications from a plurality of dedicated communication lines. The amount of data that must be received and stored by the secondary system may quickly overwhelm the capabilities of the secondary system.
It would, therefore, represent an advancement in the art to have a mirroring and archiving system that could ensure logical consistency of the data protected. It would also represent an advancement in the art to have a mirroring and archiving system that could function either locally or remotely using a low bandwidth communication link.
The foregoing problems in the prior state of the art have been successfully overcome by the present invention, which is directed to a system and method for mirroring and archiving a primary mass storage system to a secondary mass storage system. The current system and method provides several significant advantages over the prior art. First, the mirroring and archiving system and method of the present invention reduces the amount of data needed to mirror and archive by consolidating redundant changes and then transferring only those consolidated changes. Second, the system and method of the present invention emphasize security of the mirroring and archiving by ensuring that the primary storage system is in a logically consistent state when an update is made.
The present invention begins with the assumption that a primary mass storage system connected to a primary system and a secondary mass storage system connected to a secondary system contain identical data. This may be accomplished, for example, by making a complete copy of the primary mass storage system to the secondary mass storage system using either traditional backup techniques or traditional disk mirroring techniques. Once the primary mass storage system and the secondary mass storage system contain the same data, the present invention tracks the changes made to the primary mass storage system. This tracking is done by identifying new data written to storage locations in the primary mass storage system after the time that the secondary mass storage system was in sync with the primary mass storage system. By identifying those changes that have been made to the primary mass storage system, the invention identifies those changes that need to be stored at the secondary mass storage system in order to bring the secondary mass storage system current with the primary mass storage system.
Periodically, the changes that need to be made to the secondary mass storage system are assembled into an update. However, the update may contain redundant information. That is, multiple changes to a single data block present a historical view of a given storage location, but only the last change is necessary to bring the secondary mass storage system current with the primary mass storage system. Thus, the present invention minimizes the amount of data needed to resynchronize the mass storage devices by consolidating the redundant changes into a single, most recent change. Then, the update is sent to the secondary system to bring the secondary mass storage system current with the primary mass storage system. If desired, communication between the primary system and secondary system may be encrypted.
The present invention includes a mechanism to identify when the primary mass storage system is in a logically consistent state in order to determine when an update should be created. By identifying a logically consistent state and then creating an update of the changes made up to that point in time, the updates transferred to the secondary system are guaranteed to capture a logically consistent state. By creating updates of succeeding logically consistent states, the secondary system can archive one logically consistent state after another. In this way, if the archived data should ever be needed, it will be in a logically consistent state. The data stored at the secondary system moves from one logically consistent state to another logically consistent state thus eliminating one of the problems of the prior art.
Because the present invention takes a state-oriented approach to the mirroring and archiving of a mass storage system, the amount of data that needs to be transferred can be optimized. Specifically, during any given time it is not unusual for a relatively small number of data blocks to be repeatedly and frequently modified, perhaps because the data blocks represent an index structure for a database. Each change in the underlying database would require corresponding changes to the index structure. Some observations of this activity indicate that of 15,000 changes made during one five-minute period, only 900 involved unique data blocks. Prior art systems would transfer each of the 15,000 changes. However, the state-oriented approach of the present invention allows for consolidating the 15,000 changes because only 900 are necessary to represent the final states of the unique data blocks that were modified. Therefore, the present invention is particularly well suited to mirroring and archiving data to a secondary system located at a remote site. The present invention can use low bandwidth communication links to transfer mirroring and archiving data to a remote site. As an example, in many cases conventional dial-up telephone lines with a 56.6k baud modem will be entirely adequate for many situations.
The present invention also includes a cache holding area in the primary mass storage system. The cache holding area retains update files so that requests for mirrored or archived data often may be met without necessarily having to access the secondary system. Where the secondary system communicates with the primary system over a relatively slow link, the cache holding can dramatically improve the performance of accessing mirrored or archived data.
The secondary system of the present invention receives each update from the primary system. The update serves to bring the secondary mass storage system current with the primary mass storage system. In addition to this mirroring function, the updates also provide archiving. By retaining updates rather than integrating them with the synchronized data, the secondary system can deliver any of the logically consistent states that the updates represent. For example, if a problem occurs prior to a fourth update, the secondary system can combine the synchronized data with the first three updates. This combination represents the logically consistent state of the primary mass storage system as it existed at the time of the third update. Thus, the secondary system can provide any of a potentially large number of logically consistent states of the primary mass storage system. As the archival value of a given update diminishes over time, it can eventually be integrated with the synchronized data or collapsed with other updates, thereby limiting the number of updates stored at the secondary mass storage and the required size of the secondary mass storage.
Accordingly, it is an object of the present invention to provide a system and method for mass storage mirroring and archiving that minimizes the amount of data that needs to be transferred to a secondary system.
Another central object of the present invention is to provide a system and method for mass storage mirroring and archiving that can capture logically consistent states so that the secondary system is not found in a logically inconsistent state.
A further object of the present invention is to provide a cache of updates so that some requests for mirrored or archived data can be fulfilled without the delay that may be associated with accessing the secondary system.
Yet another object of the present invention is to allow the secondary system to capture successive logically consistent updates in order to provide a series of logically consistent primary mass storage system states.
Additional objects and advantages of the present invention will be set forth in the description which follows, and in part will be obvious from the description, or it may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the attended claims. These and other objects and features of the present invention will become more fully apparent from the following description and appending claims, or may be learned by the practice of the invention as set forth hereinafter.