1. Field of Invention
Embodiments of this invention relate to digital data processing systems. Specifically, it relates to a method or apparatus for transferring data through mirroring, backup and synchronization between local and geographically remote data storage facilities.
2. Background
In general, users of data processing computer systems are concerned about the protection of their data. Local and remote backup systems serve as a means for this protection should the data be lost, damaged or destroyed. Currently, many companies maintain these backup copies of their valuable data onsite on neither removable data storage devices or on secondary local data storage systems to which the data has been mirrored.
One known technique requires a computer system being taken out of service while tape backups are made. It is a requirement that these tapes are to be taken off the premises for a safe geographically remote storage of an organization's critical data. Should there be a requirement for the restoration of data to the local system, the tapes need to be retrieved from their remote location and restored to the host system. This process is very time consuming and costly not just because of the number of tapes and personnel involved, but also due to the long time delay in which the local system is inaccessible to the users.
The need for geographically remote data backup is ever increasing to prevent non-recoverable catastrophic loss of valuable data on the local system, and of any backups that may be stored at the same location.
Current real time geographically remote backup and mirroring services require a high-speed dedicated network connection between the sending and receiving systems, such as a fiber-optic connection. This is due to the fact that current mirroring systems need to communicate between the local and remote systems at data transfer rates that are capable of staying within one read or one write request on the local and remote systems. This means, that these systems require the raw data to be transferred in the exact same sequence and speed as it is read or written to disk on both systems. A read or write operation is not considered to be complete as a whole until both the local and the remote systems have completed their respective operations singularly. If the data transfer rate drops below the required speed between the local and remote systems, both systems get out of synchronization with each other rendering the local and remote systems unusable.
Mirroring data between one local host and several geographically remote systems would require as many high speed dedicated fiber-optic connections based on the fact that existing technologies attempt to transfer data at the speed of the local bus of the participating data processing systems. The absence of large memory buffers requires that the network connection speed is as fast or faster than the local bus speed. Therefore, many fiber-optic connections are limited to metropolitan area networks between buildings. State wide, interstate wide or intercontinental data mirroring is therefore unfeasible for most potential users with current technology.
Another technique requires additional special purpose hardware such as additional storage devices that are dedicated to queuing raw data as it is written to disk. These additional storage devices act as a memory buffer that maintains the sequence of the raw data disk writes. The raw data will then be transmitted in the exact sequence, as it is stored in the memory buffer. This is an attempt to adopt a high-speed internal data bus transfer rate to a slower network data transfer rate. While this technique alleviates the need for a dedicated fiber-optic network connection, it still requires a very fast network connection to prevent the system from running out of disk space and a potential buffer overflow condition, which once again would render the systems unusable. This method also does not allow for any prioritization of raw data transfers between the local and remote systems, as they must occur in the exact same sequence as they did on the originating system.
In addition, software based solutions to data mirroring such as described in U.S. Pat. No. 5,799,141 are overly complex and very inflexible in nature. It is a requirement to explicitly name the files that are to be mirrored, in a configuration database. These systems can only synchronize specific files between the local and remote systems. If a user of such a system requires the synchronization of additional files, the mirroring process must be shutdown and the path to the new files that are to be mirrored must be added to the configuration database at the local system and the remote system. There is also an additional burden to the file system of the local machine in that each time a file is written to, the name of the file must be looked up and matched against the configuration database. This has an obvious side effect in that, as the size of the configuration database grows, it will inherently slow down the file system. Once again, potentially rendering the system technically unusable.
This method also has the limitation of not automatically mirroring newly created files or directories without changes to the configuration database by the system user. The system can only mirror changes to already existing files and can't mirror complete directory structures or hard drive volumes. Furthermore, it still does not alleviate the need to queue the actual raw disk data itself thus creating large queue files, which again, require fast network connections.