This invention relates to the field of application performance analysis, and in particular to a method and system for identifying message streams corresponding to a transaction that includes communications between multiple tiers.
The ever-increasing use of applications that operate on a network has increased the need for application performance analysis systems that can assess the efficiency of transactions that utilize the network.
In a typical network-based application, a user executes the application at a client device, and in the process of executing the application, messages are communicated between the client and one or more servers. These messages are generally interspersed among messages from other applications being executed at the same time by the user, or by other users. To determine the performance of transactions of a particular application, the messages corresponding to the communications related to each transaction are distinguished from the other messages, so that performance data, such as delay times, can be collected.
A number of techniques are commonly used to distinguish messages related to transactions of an application, including, for example, distinguishing the source and destination addresses associated with the client and server(s) of each transaction. Such techniques, however, are unable to identify ‘secondary’ or ‘consequential’ communications associated with such transactions. That is, for example, a message from the client to a server may cause the server to contact another server, such as a database server. The resultant communications between the servers will not generally include a reference to the client, and techniques that rely upon distinguishing messages to or from the client will not be able to associate these communications with the transaction.
For ease of understanding and reference, the terms ‘tier’ and ‘tier-pair’ are used to identify the relationship among communicating elements. In the above example, the client is at a first tier (e.g. a user tier); the servers that the client communicates directly with are at a second tier (e.g. a web server tier); the servers that the servers at the second tier communicate directly with are at a third tier (e.g. a database server tier); and so on. A pair of elements that communicate directly is termed a ‘tier-pair’. Note that the terms ‘client’, ‘server’, ‘database’, etc. are used herein to facilitate understanding; the particular elements at any given tier may comprise any type of device with communication capability.
U.S. Pat. No. 7,729,256, “CORRELATING PACKETS”, issued 1 Jun. 2010 to Patrick J. Malloy, Michael Cohen, and Alain J. Cohen, discloses a method for determining (or approximating) which messages correspond to a particular transaction from among other messages in a set of multi-tier communication traces. The particular transaction is characterized as comprising a sequence of ‘reference’ packets, which is a sequence of packets among tier-pairs that typically occur during execution of the application, such as illustrated in FIG. 1A. For example, the reference sequence indicated by arrow 1 may correspond to a typical client's (Client A) request to a server (Web-Server B) for data, the server's request (arrow 3) to a database server (DB Server D), the database server's communication of the data (arrows 4) to the requesting server, and the requesting server's communication of this data (arrow 6) to the requesting client. The other arrows in the reference sequence FIG. 1A include, for example, communication of other requests, data, acknowledgements, and so on. These reference sequences may be based on a simulation of the application, or the operation of the application in a controlled, or isolated environment.
FIG. 1B illustrates the sequence of communications 1, 2, 3 . . . 9 corresponding to a transaction that occurs during the execution of the application on an actual network. As illustrated, the sequence is masked by other communications occurring between the tier-pairs A-B and B-D. As disclosed in U.S. Pat. No. 7,729,256, sets of traces of communications between tiers in the actual network are analyzed to find a sequence in the traces that appears to be similar to the reference sequence, based on a measure of correlation between possible sequences in the traces and the reference sequence. The correlation may be based on factors such as information in the header of the packets, the size of the packets, key words or phrases in the packets, and so on.
The use of a reference sequence to find a matching sequence of packets in a production environment, however, requires the creation and/or identification of a sequence that is representative of a transaction or set of transactions that are likely to occur during the execution of the application of interest, as illustrated in FIG. 1A. In some applications, particularly ‘static’ applications, this may be a fairly straightforward task. In ‘dynamic’ applications, such as highly interactive applications, the transactions may differ based on the particular user, or the particular tasks performed within the application. In such a dynamic environment, different reference sequences may need to be defined, each reference sequence being specific to a particular user, or a particular task.
Also, because the specific content of a sequence of packets can be expected to differ among different users of an application, the use of correlation factors based on content is fairly limited when using pre-defined reference sequences.
It would be advantageous to be able to identify sequences associated with transactions of an application in a production environment without having to identify a reference sequence a priori. It would also be advantageous to be able to automatically identify characteristic sequences within multiple traces of executions of an application at different times.
These advantages, and others, can be realized by a system and method that determines correlations within multi-tier communications based on repeated iterations of a user transaction. Content-based correlations are determined by encoding the content using a finite alphabet, then searching for similar sequences among the multiple traces. By encoding the content to a finite alphabet, common pattern matching techniques may be used, including, for example, DNA alignment algorithms. To facilitate alignment of the traces, structural and/or semantic breakpoints are defined, and the encoding in each trace is synchronized to these breakpoints. To facilitate efficient processing, a hierarchy of causality among tier-pairs is identified, and messages at lower levels are ranked and temporally filtered, based on activity intervals at higher levels of the hierarchy.
Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions. The drawings are included for illustrative purposes and are not intended to limit the scope of the invention.