This invention relates to networked computer systems and to data replication and mirroring of data in distributed data storage devices that are connected by a network.
It is common in many contemporary computer systems to require continuous access to stored information. The conventional data center procedure of taking data storage systems offline to update and backup information is not possible in these computer systems. However, system reliability demands the backup of crucial data and fast access to the data copies in order to recover quickly from human errors, power failures and software bugs. In order to recover from natural disasters, it is common to share data among geographically dispersed data centers.
The prior art has generated several solutions to meet the aforementioned data backup and sharing needs. One prior art solution is data replication in which a second copy or xe2x80x9cmirrorxe2x80x9d of information located at a primary site is maintained at a secondary site. This mirror is often called a xe2x80x9cremote mirrorxe2x80x9d if the secondary site is located away from the primary site. When changes are made to the primary data, updates are also made to the secondary data so that the primary data and the secondary data remain xe2x80x9csynchronized.xe2x80x9d
Data replication can be performed at various levels. For example, the entire database may be mirrored. However, tight synchronization between the primary and mirrored data for an entire database often introduces a significant system performance penalty because of the large number of data update transmissions between the primary and secondary sites that are necessary to ensure transaction and record consistency across the entire database.
To improve system performance when data replication is used, some data replication systems replicate only portions of the data. For example, replication may take place at file-level. Conventional file-level replication systems are often incorporated in the software drivers on the host and generally employ conventional networking protocols, such as TCP/IP, to connect to the remote data site over a local or wide area connection.
Alternatively, in other prior art systems, data replication takes place at the volume level, where a volume is a logical, or physical, disk segment. Instead of replicating database transactions or file systems, this technique replicates logical or, in some cases, physical disk volumes. Volume replication is flexible in the sense that it is generally independent of the file system and volume manager software. Volume replication can also be used in conjunction with database and file replication to help ensure that not just the data specific to the database or a particular file system, but all relevant data is replicated to the remote site.
There are principally two techniques commonly used for data replication: synchronous and asynchronous replication. Synchronous techniques forward data writes generated by the host to the remote site and await confirmation that the remote data has been updated before signaling I/O completion to the requesting host. Synchronous replication has the advantage that, if the primary site is rendered inoperative, the secondary (remote) copy may be used to continue operations after the user community and the applications are switched to the alternate site.
One problem with synchronous replication is that all data must be safely committed to the remote site before the local host write is acknowledged. Consequently, synchronous mirroring is generally limited to relatively short distances (tens of kilometers) because of the detrimental effect of round-trip propagation delay on I/O response times. In addition, if the remote storage is unavailable or the link between the local and remote sites is disabled for a prolonged time period, the host cannot complete its processing and business disruption occurs, even though the primary site has a perfectly operational system.
It is possible to avoid this problem, by maintaining the synchronous writes only when the remote site is available. If the remote site becomes unavailable, then the primary site keeps track of all data writes and the remote site is updated when the remote service is reliably restored. This approach trades recovery time in favor of higher data availability by eliminating the single point of failure produced by the remote storage site.
Alternatively, asynchronous replication methods can be used. Asynchronous techniques affirm primary I/O completion to the originating host before updating the remote data. However, if the link connecting the primary and secondary sites has significant latency, local writes must be queued at the primary site for later transmission when the site link is available. Consequently, in these situations, there is a higher possibility of losing buffered and in-flight data if the primary system fails. A non-volatile memory is needed to prevent data loss in this situation.
Occasionally, remote mirroring operations are interrupted either intentionally or by unplanned outages. If either the primary data or the secondary data continues to be updated during this period, then the data images are no longer synchronized. Resynchronization is the process of re-establishing mirrored images once the remote copy service is restored. Full resynchronization is accomplished by making a complete copy of the data and is time-consuming. One way to reduce resynchronization time is to log data changes during the interruption. Journals and scoreboards (bit-vectors) are two recognized ways to accomplish this logging. Journals designs capture every new write in a running log, whereas scoreboards keep track of changed locations.
No matter which of the data replication systems are used, a significant amount of management time can be consumed in initially setting up the data replication system and managing it after it is running. In particular, the data manipulation processes, which actually perform the data updates and synchronization, are typically low-level routines that are part of an operating system kernel running on a particular machine. These routines typically must run on that machine and are written in platform-dependent language. However, in a large, distributed computer system connected by a network, management personnel and resources may be physically located anywhere in the system. Thus, it becomes time-consuming to contact both the primary and secondary storage systems from what can be a remote location to ascertain if space in these systems is available, reserve the space in the respective systems, configure the low level routines to perform the data replication and then access a configuration database that can also be located anywhere on the network to record the particular data replication configuration so that other system managers will not attempt to use resources that have already been reserved.
Therefore, there is a need to provide a simple, fast way to set up and manage a data replication system among resources that may be located anywhere in a distributed system and to provide coordination information to a manager who may also be located anywhere in the system.
In accordance with the principles of the invention, a three-tiered data replication management system is used on a distributed computer system connected by a network. The lowest tier comprises management facade software running on each machine that converts the platform-dependent interface written with the low-level kernel routines to platform-independent method calls. The middle tier is a set of federated Java beans that communicate with each other, with the management facades and with the upper tier of the system. The upper tier of the inventive system comprises presentation programs that can be directly manipulated by management personnel to view and control the system.
In one embodiment, the federated Java beans can run on any machine in the system and communicate via the network. A data replication management facade runs on each host and at least one data replication bean also runs on the host. The data replication bean communicates directly with a management GUI or CLI and is controlled by user commands generated by the GUI or CLI. Therefore, a user needs to log onto only one machine in order to configure the entire data replication system.
In another embodiment, another bean stores the configuration of the data replication system. This latter bean can be interrogated by the data replication bean to determine the current system configuration.
In still another embodiment, a data services bean locates and prepares volumes that can be used by the data replication system.
In yet another embodiment the presentation programs include a set of management graphic user interfaces (GUIs)
In another embodiment, the presentation programs include command lines interfaces (CLIs).