Today, many modern computing systems are based on event-driven architectures (EDA). In an event-driven architecture, several computer applications each execute on distinct computer systems and are typically interconnected by a network, such as a local area network or even the Internet. Each application is in charge of executing a certain processing task, which represents a processing step in an overall process. Examples include the calculation of complex mathematical models (e.g. for weather forecasts or scientific computations) by a plurality of distributed computers, or the control of an assembly line e.g. for the manufacturing of a vehicle, wherein each assembly step is controlled by a particular application participating in the overall assembly process.
In order for the individual applications to communicate with each other (e.g. to notify the next application that it can now start processing the next process step, or to exchange intermediate results between the applications), event-driven architectures employ a common notification infrastructure (so-called “event bus”), which serves as a central communication backbone. Event producers (i.e. applications producing events as a result of their processing) advertise (i.e. select) a specific channel on the event bus, which is used later on to publish events. An event consumer (i.e. an application in need of a certain event in order for it to be able to conduct its own processing) subscribes for those channels it wants to receive events on. A main characteristic of such architectures is that the producer(s) and consumer(s) do not have to know each other; the communication is decoupled chronologically, territorially and regarding the synchronism. Due to the loose coupling between the interacting applications, event-driven architectures are said to be easier to change and extend simply by replacing and adding applications, respectively, thereby changing the overall process being executed. Enterprise systems based on the concept of event-driven architecture typically are more adaptive regarding external needs than systems that follow e.g. the peer-to-peer model.
On the other hand, the loose-coupling of applications and the high system dynamics in event-driven architectures raise several difficult questions, e.g. how to ensure that mission critical processes are correctly handled in real-time. In other words, it is desired to reliably detect process execution disruptions, such as an out-of-sequence execution of a certain process step, or a timeout of a process step, a step transition or the entire process. It goes without saying that such a detection technique should allow to detect process disruptions in realtime, i.e. as soon as possible while the process is actually executing. This realtime detection is an essential prerequisite for preventing that a currently executing process malfunctions, which may have a fatal impact on the underlying system. For example, if a disruption during execution of an assembly process of a vehicle is not detected in a timely manner, both the vehicles to be produced and the assembly line machinery may be severely harmed or even destroyed.
In the context of monitoring computer-aided processes, so-called Business Activity Monitoring (BAM) products are known. Whereas first generation BAM products were limited to processes running within one single BPM (Business Process Management) system, second generation products explicitly include processes that span over several event-driven applications. Exemplary products include Oracle Business Activity Monitoring (Oracle BAM), Tibco), or webMethods of applicant. Further, Messaging Infrastructure Systems are known, which allow applications to be distributed over heterogeneous platforms and which attempt to reduce the complexity of developing applications that span multiple operating systems and network protocols. In asynchronous systems, message queues provide temporary storage when the destination program is busy or not connected. Examples include IBM Websphere MQ, Oracle Advanced Queuing, or webMethods Broker of applicant.
The above-mentioned messaging systems provide for a guaranteed message delivery, i.e. in case a subscriber for a channel is not available in the moment an event is published, the event is kept within the messaging system as long as it could be delivered to the subscriber. However, the reliable delivery of event messages does not say anything about whether a process indeed meets the expected cycle time. For example, the successful receipt of an event does not mean that the event is instantly processed by the consumer. Furthermore, messaging systems typically do not provide for the notion of a (business) process. Specifically, the do not provide for means to correlate events published on different channels belonging to one and the same overall processing task.
Also, current BAM products attempt to provide real-time detection of business process errors in distributed environments based on the event-driven architecture. However, one severe drawback of these products is that they cannot distinguish between different error reasons. Rather, such systems just detect that a certain rule (e.g. a key performance indicator, KPI) is not met and react in the same way, independent of the original reason of the error.
Further, in order for most conventional BAM products to operate properly, these products need an exact definition of the overall process to be executed (i.e. a definition of the individual process steps, their sequence and the transitions between the steps). However, in highly dynamic and distributed systems, such as event-driven applications, a global process definition is seldom available.
It is therefore the technical problem underlying certain example embodiments to provide an improved system and method for the detection of process execution disruptions in event-driven architectures, thereby at least partly overcoming the above explained disadvantages of the prior art.
This problem is according to one aspect of the invention solved by a system for realtime detection of process execution disruptions in an event-driven architecture, wherein a plurality of event-driven applications each execute at least one process step to participate in the execution of a process. In the embodiment of claim 1, the system comprises:    a. an event bus, usable by the plurality of event-driven applications to communicate events among each other;    b. wherein the event bus comprises a control channel, the control channel being adapted for receiving at least one start event and at least one stop event from the plurality of event-driven applications, wherein the start and stop events indicate the execution of a corresponding process step;wherein the system further comprises:    c. a Complex Event Processing (CEP) engine, adapted for analyzing the start and stop events on the control channel to detect a disruption of the process.
Accordingly, this embodiment allows to detect disruptions of a process that spans across multiple, distributed event-driven applications in realtime, i.e. while the process is currently executing. The system makes use of a Complex Event Processing (CEP) engine to analyse event streams for pattern matching. To this end, within the event bus commonly used by the event-driven applications in accordance to the publish-subscribe model, an additional control channel is defined. This additional control channel serves to record start and stop events, which are issued by the event-driven applications before and after they execute their respective processing tasks (steps). The CEP engine then analyzes the events published on the control channel to detect process execution disruptions. It is important to note that in difference to other technologies, the proposed system does not rely on the existence of a globally defined process model, but only uses the information obtained from the control channel. Furthermore, the system makes use of non-permanent events that occur on the event bus, i.e. it does not need any persistent data, which would otherwise have to be stored and kept on some storage medium.
In one aspect of the present invention detecting a disruption of the process comprises detecting that process steps are not executed in a predetermined order, detecting that the execution time of a process step exceeds a predefined threshold, detecting that the execution time of a process step transition exceeds a predefined threshold and/or detecting that the execution time of the process exceeds a predefined threshold.
Preferably, an event on the control channel (i.e. any start event and stop event published on the control channel) comprises a process identifier, a process instance identifier, a process step identifier and/or a type, the type indicating whether the event is a start event or a stop event.
In a preferred embodiment of the present invention, the CEP engine executes at least one continuous query on the events on the control channel to detect a disruption of the process. To this end, the at least one continuous query may evaluate the sequence of start events on the control channel to determine whether the process steps are executed in a predetermined order. Additionally or alternatively, the at least one continuous query may comprise a predefined process step threshold and may evaluate whether a start event on the control channel is followed by a corresponding stop event before expiration of the threshold. Furthermore, the at least one continuous query may comprise a predefined process step transition threshold and may evaluate whether a stop event on the control channel relating to a first process step is followed by a start event relating to the succeeding process step before expiration of the threshold. Lastly, the at least one continuous query may also comprise a predefined process threshold and may evaluate whether a stop event on the control channel occurs relating to the last step of the process before expiration of the threshold. It should be noted that the present invention is not limited to the above-defined continuous queries, and that one or multiple of the above evaluations may be conducted by the same continuous query.
Preferably, the at least one continuous query takes into account a process instance identifier defined in the start and stop events to correlate events relating to the same process instance. Accordingly, the process instance identifier enables the system to detect which events belong to the same process instance. Otherwise, it would not be possible to meaningfully analyze the events on the control channel, since events from multiple different process instances could not be differentiated.
Certain example embodiments also provide a method for realtime detection of process execution disruptions in an event-driven architecture, wherein a plurality of event-driven applications each execute at least one process step to participate in the execution of a process, wherein the method comprises the steps of receiving, on a control channel of an event bus, at least one start event and at least one stop event from the plurality of event-driven applications, wherein the start and stop events indicate the execution of a corresponding process step and analyzing, by a Complex Event Processing (CEP) engine, the start and stop events on the control channel to detect a disruption of the process. Further advantageous modifications of embodiments of the method of the invention are defined in further dependent claims.
Lastly, a computer program is provided, the computer program comprising instructions for implementing any of the above-described methods.