1. Field of the Invention
This invention is related to the field of networked computer systems and, more particularly, to techniques for inter-process messaging.
2. Description of the Related Art
Companies today rely to an unprecedented extent on online, frequently accessed, constantly changing data to run their businesses. Unplanned events that inhibit the availability of this data can seriously damage business operations. In addition, any permanent data loss, from disaster or any other source, will likely have serious negative consequences for the continued viability of a business. Therefore, when disaster strikes, companies must be prepared to eliminate or minimize data loss and recover quickly with usable data.
Even with the most well-executed backup strategy, restoring from tape usually results in several hours of lost data. For many environments, this kind of data loss is unacceptable and real-time replication is an essential technique. Real-time replication not only minimizes or eliminates data loss, but also enables rapid recovery when compared to conventional bulk data transfer from sequential media. The replica data is available and ready for use on disk at a disaster-safe location.
A successful replication solution would allow operations to continue without a significant break in continuity. To keep the recovery data removed from the impact of a disastrous event, it should be stored in a different geographical location from the primary data. Depending on business requirements, this could be across campus, across town, or across continents. In addition to having data at a disaster-safe location, the ideal replication solution would ensure that the replica volumes are current (fully up-to-date), complete (including all applications, logs, configuration data, etc.), and recoverable (consistent data that is free from errors).
In a distributed environment, replication as well as other management tasks may require substantial messaging traffic across the network. Messages between and among hosts may be used to ensure that data is up-to-date, that logs are properly maintained, that configuration data is exchanged, and for other suitable purposes. Of course, inter-process messaging requires the opening and use of network connections (often using TCP/IP protocols) between nodes on the network. Substantial overhead may be incurred in opening, using, and maintaining these network connections. For example, long delays may be encountered when opening TCP/IP connections over a wide-area network (WAN) with high latency, such as a network having nodes separated by hundreds or thousands of miles. Furthermore, most systems place restrictions on the number of concurrent network connections that are accessible by a particular host or by a process executing on a host. When high security is desired, the opening of a secure connection (e.g., a secure socket layer [SSL] handshake) may cause further delays. Additional overhead may be incurred in the maintenance of network connection status, such as by using ICMP ping operations to provide a “heartbeat.”
In a distributed management framework where multiple software objects may exist across multiple hosts, robust and efficient communication between hosts is highly desirable.