In general, the present invention relates to the correlation of events emitted by software components.
Events are special messages that are emitted or published by software components, called emitters, to indicate a state change during operations of the emitting component. Events are consumed by other components, called consumers. Consumers retrieve events by subscribing or registering with the event infrastructure. Thus, there is no pre-defined relationship between emitting and consuming components. One example of a consumer is a monitoring tool that observes the performance of a system.
If a state change in a particular component is detected by a consumer, there are scenarios in which the consumer is interested in finding previously indicated state changes of other components which have a causal relationship with the current state change. Usually, a prerequisite for a causal relationship between two events EA and EB of component A and component B, where EB is consumed after EA, is the fact that component A has either directly or indirectly invoked component B. In other words, there is a call chain from A to B. Thus, a consumer needs to be able to determine from the events whether there is a call chain or not.
The diagram of FIG. 1 illustrates a synchronous call chain of emitting components with their associated events for entry and exit of a component. Depicted is a call sequence of components using an SCA (Service Component Architecture) infrastructure. A first task A calls a process B using the SCA component B. The process B then calls a sub-process C using SCA component C, The process C then calls the JCA connector (a connector according to the Java Connector Architecture specification) D using SCA component D. Whenever a component is called or entered, an entry event is emitted. Whenever the work of a component is finished, an exit event is emitted. This leads to the event sequence that is depicted on the right hand side if the scenario is observed after the execution.
Consider this scenario from a consumer perspective. For example, a consumer receives the event “Entry JCD” and would like to be able to find on behalf of which component the component JCD has been called. For that purpose it would be necessary to correlate the event “Entry HTA” with the event “Entry JCD”.
The usual approach to correlate events is to use the timestamps of the events. The timestamps allow calculating the “distance in time” which helps to decide whether events can be correlated at all. However, for environments in which there are multiple concurrent threads of execution, the frequent timestamps of events occurring in different threads does not support a correlation of events.
There are basically two approaches based on explicit correlation information in events that can be used to determine the correlation of events: a strongly linked approach and a weakly linked approach.
The principle of the strongly linked approach is that all involved components of a call chain can be identified in each event. In the example, illustrated by FIG. 1, that would mean, for instance, that the events of the JCA connector D would need to carry the identification of the SCA component D, the BFM (Business Flow Manager) process C, the SCA component C, the BFM process B, the SCA component B, and the HTM (Human Task Manager) task A. Hence, a single event supports determination of the complete call chain.
The weakly linked approach is based on the idea of minimizing the necessary correlation information in the events. The weakly linked approach only requires the identification of the current component plus the identification of the calling component in the event. However, each component needs to emit events in order to enable consumers of the events to detect the call chain. In the given example, an event of the JCA connector D carries the identification of the connector component itself plus the identification of the calling component, which would be the SCA component D. Events of the SCA component D would need to include the identification of the SCA component D and the BFM process C and so on and so forth. By receiving all the events with the direct caller/callee information, the transitive caller/callee relationship can be calculated.
Usually, in case of business level monitoring, the SCA component events are not of interest. Thus, one would only emit events for the Task A, the Process B, the Process C, and the JCA Connector D. However, this is not possible in the weakly linked case, because both the caller correlation and the callee correlation identification always need to be emitted in an event. If, for example, SCA component D does not emit an event, then there is no information contained in the events that SCA component D was invoked by Process C. Thus, it is impossible to determine that JCA D was actually invoked on behalf of Process C. As a result, in the example case, 14 events with a constant correlation header size of 2 entries need to be emitted.
Using the strongly linked case avoids emitting the SCA component events that result in 8 events. However, the strongly linked events need to carry the correlation information of all involved components so far which means n/2 times correlation identifiers on average if n is the number of involved components in the call chain. The example ends up with an average of four identifications in events. Thus, the number of necessary events is significantly reduced compared to the weakly linked case but significant additional load is added to the events, which again causes an undesirable load on the event infrastructure.