OpenSAF is an Open Source Project established to develop High Availability middleware consistent with Service Availability Forum (SA Forum) specifications. OpenSAF specifications seek to guarantee that if any application crashes in one node (fails) another node, running the same application will take over, and the node that crashed will be restarted. These operations can be performed agnostically to what kind of application a node was running. As such OpenSAF is ignorant to the operational state of the applications that are running on each node. Operational state is maintained among nodes actively using N-way replication but if a particular node crashes OpenSAF has no inherent mechanism for recovering the node that has crashed to its original state. A specific class of applications that is severely impacted by this limitation are Complex Event Processing (CEP) applications.
Commercially available CEP systems include Esper HA, Oracle CEP, Sybase ESP, and Websphere Business Events which can be deployed across a plurality of event processing modules that can, in turn, be deployed across a plurality of separate physical processing nodes (“CEP nodes” and “nodes”) that are communicatively networked together or deployed on a single processing node (e.g., as virtual machine processes operating under control of a virtual hypervisor). These CEP systems support deployment of serialization in each event processing module. Serialization is the process of translating data structures or object state within an event processing module (or CEP node) into a format (structure) of values that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and “resurrected” later in the same or another event processing module (or CEP node). When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original event processing module (or CEP node), including its operational state values.
When serialization is supported or active, checkpoints can be taken periodically to capture the operational state values of the CEP system. Through checkpoints and serialization, when an event processing module (or CEP node) recovers from a failure, the most recent checkpoint will be used in order to recover the operational state values of the event processing module (or CEP node) to its most up-to-date state prior to the failure.
The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.