In the prior art, many message delivery systems exist which offer assured message delivery between endpoints, such as between different applications. Assured message delivery is also known in the art as persistent, guaranteed or durable messaging. An exemplary implementation of an assured messaging system is detailed in U.S. Pat. No. 7,716,525 (Buchko) the contents of which are herein included by reference.
Many of the applications that make use of assured message delivery systems are mission-critical in nature and in some cases are critical to business continuity. Prior art assured message delivery systems often include redundancy as a feature such that they can recover quickly from component failure; in applications that are critical to business continuity component level redundancy may not provide sufficient protection. In these cases the required level of system availability can only be achieved by replicating the messages and transferring them to another system possibly in another location. Redundancy provided by message replication is distinguished from component redundancy by the location of the redundant equipment and the failover mechanisms. In a replication deployment the secondary infrastructure would typically be in a separate building. Component level redundancy schemes typically feature automatic failover to minimize the duration of the outage, in these situations the systems detect the failure and switch over to the redundant system without any intervention from the network operator. In the case of replication, the decision to switch to the secondary site is most often made by network operators.
Prior art assured messaging systems typically use disk and disk based file systems as a non-volatile store of message data and related state. The disk store is typically located in a separate system and connected by a storage area network (SAN). Disk storage equipment from EMC Corporation supports a feature called Symetrix Remote Data Facility (SRDF) and other manufacturers support similar features where data stored to disk is synchronously or asynchronously mirrored to a disk located in a remote site using wide area networking technologies (typically TCP/IP combined with iSCSI). Assured messaging systems that make use of disk storage are able to use features such as SRDF to implement replication. In such an implementation message data and state replicated by the disk system can be recovered by a secondary system which is able to resume operation in the case of a service interruption in the primary infrastructure. Systems implemented in this way suffer from a few undesirable characteristics. They are slow to become active after a switch and the necessity to write to disk affects normal run time performance. These systems are slow to recover replicated messages and state because they are unable to maintain state in real time. Disk based file systems are not generally multi access, meaning that only one system can have access to the data stored on the disk at a time. The practical limitation to the assured messaging system that relies on mirroring a disk based file system to a remote site as a persistent store is that the system in the secondary site cannot have access to the file system stored on the disk until it is determined that the replicated messages must be recovered. At this point the secondary system must mount the mirrored copy of the file system and rebuild all state from data stored on the disk; this operation could take from several minutes to hours to complete.
The properties of disk based file systems also affect the real time performance of the assured messaging system, as described by Buchko. In particular the latency associated with accessing disks is amplified if the disk writes must be synchronously mirrored to a remote site. By the nature of how the previously discussed disk mirroring features work, even if a single user of the messaging system required synchronous mirroring of data to a remote disaster recovery site then all users of the assured messaging system would suffer the additional performance penalty.
The primary use of replication in message delivery systems is to aid in the implementation of redundancy. Current replication implementations that rely on features of disk systems to mirror persistent data to a secondary system suffer from a number of limitations. It would be desirable to have a replication implementation with the following characteristics: synchronous and asynchronous assured message delivery without the need to involve disk based storage, isolation of users such that user(s) that do not require replication are not affected by those that do and real time update of message delivery state in secondary system to facilitate fast resumption of activity after a switch has been effected.