1. Field of the Invention
The present invention relates to an improved data processing system and, in particular, to a method and apparatus for information processing. Still more particularly, the present invention relates generally to multicomputer data transferring.
2. Description of Related Art
Transactions, which can be regarded as any unit of computational work or a message that represents one, usually cause the invocation of other transactions, each of which may repeat the process, particularly in distributed data processing systems such as the World Wide Web. The resulting cascade of transactions can be represented as a directed graph of transactions; a single root node represents a root or initial transaction, and an arbitrarily large number of intermediate and leaf nodes in the graph represent subsequent transactions.
Each node in the directed graph, i.e. each transaction, can be identified by a correlation token with a unique value. Hence, a correlation token contains a unique value which logically associates a node in the directed graph with its represented transaction; from another perspective, the correlation token identifies the represented transaction. Each transaction's invocation by a parent transaction can be represented by an arc in the directed graph. An arc is identified by a pair of correlation tokens, one of which represents a transaction and one of which represents a parent transaction; however, there could be multiple parent transactions for a given transaction, and the root transaction does not have a parent transaction.
If all of the pairs of correlation tokens are aggregated together, a straightforward algorithm can be used to search through the tokens and to build a particular call graph, which is a particular directed graph that comprises a set of invocation arcs for a set of related transactions. Problems arise, however, when putting this into practice.
In a modern computer system, there may be many thousands or millions of systems that could be involved in any one call graph. Although the number of systems actually in a particular call graph is typically not more than 5-25, the pool of systems that might be in the call graph may be thousands or millions, and transactions from the entire pool represents the search space. Each system in a call graph may be executing thousands of transactions per second, which results in an enormous amount of data spread across millions of systems. As an example of a computationally intensive algorithm for aggregating the correlation tokens, all of the correlation token data could be sent to a central system and analyzed there. Although this is algorithmically simple, it is impractical given the number of systems and the volume of data.
In an alternative algorithm, some network-related data that indicates the location of a transaction's parent could be stored in each correlation token, and the token pair data could be stored on the system on which it was collected. This avoids the network bandwidth to move the data to a central system unless the specific data is required. A call graph can be built starting from a leaf node back to the root one step at a time but not from the root node towards the leaf nodes.
This alternative algorithm has limited utility because there are three constraints to build a complete call graph. The first constraint, termed “C1” for reference purposes, requires that every leaf node in a directed graph must be known. The second constraint, termed “C2”, requires that the identity of each transaction that is part of the call graph on each leaf node must be known. The third constraint, termed “C3”, requires that the network location in the token must be understood by the analysis program. Constraints “C1” and “C2” render the solution impractical; without further data, this information for each of these constraints is unknown.
A partial solution to these constraints can be achieved using a technique in the Application Response Measurement (ARM) standard, which uses a trace flag in each correlation token. A trace flag in the correlation token of the root transaction is turned on. At each intermediate and leaf node, i.e. each time that a child correlation token needs to be generated, the trace flag in the parent correlation token is inspected. If the trace flag is turned on in the parent correlation token, then the trace flag in the child correlation token is turned on. This results in the trace flag being turned on in all of the correlation data for the transactions in a particular call graph. If each system sends all correlation data having an active trace flag to a central system for analysis, the central system would be able to build the particular call graph.
However, there are two significant limitations with this approach. First, the trace flag must be turned on in the root transaction before the root transaction starts so that the trace flag will propagate to the intermediate and leaf node transactions. This limitation precludes the ability to make a determination about whether to trace a family of transactions based on observed attributes of a transaction, such as a slow response time or the existence of an error, or any other conditions for which tracing might be desirable after the root transaction has already started. Second, if the trace flag is turned on in many root transactions to avoid the first limitation, an overabundance of data will be generated, thereby creating the same bandwidth and processing bottleneck that rendered impractical the previously described simple algorithm.
Constraint “C3” limits the utility of the above noted algorithm to those systems that share a common understanding of network location. Although a common understanding of network location can be achieved with long system identifiers, this approach conflicts with the need to keep correlation tokens as small as possible. In addition, it may expose information about network internals, which itself may pose a security risk.
Short system identifiers solve this network location interpretation problem if the short system identifiers are used as keys into a registry that contains sufficient information to unambiguously determine a particular network location. However, this approach creates a need for there to be a small number of registries that contain this information so that the search space is reasonably bounded. This approach may also require coordination between the owners of the registries to avoid the same short system identifier from being stored in multiple registries. In a large distributed data processing system like the World Wide Web, there would be a need for numerous registries, generally grouped in domains such as a company or a large division in a company. It is not practical to expect coordination between the owners of all the registries, nor is it practical to perform an exhaustive search.
Therefore, it would be advantageous to have a technique for generating correlation tokens with associated identifiers that can be practically stored and processed for analyzing transactions in large distributed data processing systems such as the World Wide Web.