Computing systems today are often complex, involving many integrated applications executing on one or more computing systems. Yet, when problems occur with such systems, analysis is often hampered by the complex nature of the computing.
Most computing systems such as individual servers in a distributed computing environment are configured, via a logging or other instrumentation service provider, to generate reasonably useful logs of their own activity. Servers further provide tools to assist a system administrator to analyze the server logs for problem determination. Many middleware applications that facilitate communication between other applications also provide a logging service and analysis tools. However, it is common today for a distributed application configuration to include six or more independent servers located on a multitude of physical machines. Correlation of the various error or other event logs from each of the applications, especially those applications on different physical machines, is complex and may not be possible.
Correlation is the process of relating information based on the contents of the information. For example, correlation is used to determine relationships (both implicit and explicit) between instrumentation information captured in instrumentation artefacts generated by an instrumentation service. Such artefacts may comprise trace records, log records, and messages generated by a computer system.
How correlated events are related to one another may be determined by the type of correlation. Associative correlation is used to group events that are related to one another, such as a set of events describing the processing of a specific request.
Associative correlation is typically performed using one of two methods: a) A unique ID is created that is used by all related events; or b) Each event is assigned a unique ID and information is provided which relates the IDs associated with related events.
Sequential correlation is used to order events sequentially, in the order in which the events occurred to indicate flow. Sequential correlation can be used to order log and trace records created by a product or show the order in which events occurred between several products.
Sequential correlation may be implemented in a number of different ways. In many products, the sequence of events may be implicitly defined by the order of the events in a log. In other products, a timestamp is used to sequence the events. However, event order in a log may be misleading and a timestamp may not be sufficiently granular. Neither method addresses products which use distributed logs on two or more distributed computers having clocks out of synchronization.
Environmental correlation is a special type of associative correlation, in that an association is drawn between an event and the environment (e.g. execution environment) that created the event.
The scope of correlation defines the range of events to be correlated.
There are two general scopes of correlation, intra-log correlation (the relating of events within a log) and inter-log correlation (the relating of events within separate logs).
Correlation is typically performed by using information contained in the event logs to determine relationships between the events.
Deterministic correlation creates relationships between events by using explicit correlation information contained in each event to determine the relationships within the data.
Correlating data using explicit data correlation is usually reliable, limited only by the type of correlation (associative, sequential, environmental) provided by the data correlators used. Deterministic correlation can only be performed for those software products (e.g. applications) that capture the explicit correlation information (correlators) in their event information. With few exceptions, today's products do not include correlation information in their data and must be modified (re-instrumented) to add the correlator information to their existing log and trace information. In other words, deterministic correlation cannot be used for all products in a computing solution until each of the products has been modified to provide explicit correlation information.
Deterministic correlation between products requires the products to exchange correlator information which is then captured in the events created by the products. Therefore, not only must each product be re-instrumented to capture the correlator information in their events, but the products must also be modified to exchange correlator information with other products. Often, there are performance impacts involved in exchanging correlation information during runtime, requiring coordinated usage models between the products. Adding correlation information to a product to product communication may adversely impact performance when that added information is too large or of unfixed size.
Some products recognise the need for correlators between events that occur within the same or on separate servers in a distributed application environment. For example, one product, Tivoli® ARM (application response measurement) measures service response levels for transactions in a distributed environment. Tivoli is a registered trademark of International Business Machines Corporation. ARM employs transaction correlators to provide a capability to break down a transaction into its component parts, so that the contribution of each part to the total response time can be analyzed.
In accordance with ARM, each application responsible for a component of the overall transaction to be measured is modified to include calls to ARM via an application programming interface (API). The calls may request correlators for transactions with one or more child transactions (i.e. a transaction invoked in response to the requesting or parent transaction), send the assigned correlators to the child transaction(s) along with the data needed to invoke (i.e. cause the occurrence of) the child transaction(s) and pass correlators received from parent transactions to the ARM measurement agents.
ARM measurement agents follow conventions when creating correlators in accordance with a defined format. Included within the correlator is environment information identifying the computer, the transaction class, the transaction instance, and some flags. The ARM correlator format is somewhat flexible and extendible; however, the correlator and the framework for handling it are specific to the needs of the ARM service. The size of the ARM correlator may adversely impact performance in some scenarios. That is, it is not a generic correlator per se for use by one or more varied service applications. Moreover, ARM correlators provide identification only to the level of a transaction instance.
A solution to some or all of these limitations or problems is therefore desired.