Measurement and management of computer systems performance is becoming increasingly important in businesses and industries that rely heavily on information technology (IT). The financial services industry, for example, is comprised of investment houses, banks, stock exchanges, brokers, and others who conduct countless computerized transactions on a daily basis and whose capital investments in technology may be tremendous. It is imperative, therefore, that participants in this or other information-dependent industries possess not only high-powered computer systems capable of handling high volumes of computerized transactions, but also that those systems function as nearly as possible to peak efficiency.
The concept of latency is often used a gauge of computer system and network performance. In a computer system or network, latency is the total time between two measurable points and is often used to mean any delay that increases real or perceived response time. This time may include the time it takes a message to be sent between processes or business offices over the network. It may also include the time spent in writing details or data to a disk or database. Other contributors to latency include processing/calculation delays, mismatches in data speed between the microprocessor and input/output (I/O) devices and inadequate data buffer, propagation (the time it takes for a packet to travel between one place and another); transmission medium (optical fiber, wireless, or some other medium); packet size; router and other processing (each gateway node takes time to examine and possibly change the header in a packet); and other computer and storage delays (e.g., within local area networks (LANS) or similar networks at each end of the journey, a packet may be subject to storage and hard disk access delays at intermediate devices such as switch and bridge).
A currently available IT performance optimization standard is application response measurement (ARM). ARM is a specification that details application response measurement and is provided as part of a software developer's kit that is available from various vendors including the Computer Measurement Group (CMG) headquartered in Turnersville, N.J. CMG and its members are concerned with measurement and management of computer systems, including performance evaluation of existing computer systems to maximize their performance (e.g., response time, throughput, etc.) and capacity management when enhancements to existing systems are planned and when new systems are being designed. The ARM specification is supported by commercial software available from Hewlett Packard Co. of Palo Alto, Calif., Tivoli Systems, Inc. of Austin, Tex. and BMC Software, Inc. of Houston, Tex. The ARM program includes an application program interface (API) that can capture system measurement data. However, at each transfer of the measurement data from one component in a computer system to the next, a unique API-generated handle (or “correlator” or “identifier”) is created and transferred to the next system component. Hence, if processing time or other transactional data is to be passed from a first server to a second server in a computer system, then a first unique handle is generated by the ARM API that is correlated or associated with the transactional data, and the first handle and its associated transactional data are then passed from the first server to the second server. Likewise, if processing time or other transactional data is to be passed from the second server to a third server in the computer system, then a second unique handle is generated by the ARM API that is correlated or associated with the transactional data, and the second handle and its associated transactional data are then passed from the second server to the third server. In large systems that process a complex transaction comprised of many subtransactions, it becomes readily apparent that many unique API-generated handles must be created and passed through the system. Creating and passing multiple API-generated handles throughout a computer system requires that that the ARM API include a correlation application or program for tracking and correlating the processing time and other transactional data with the various handles as they through the computer system. Such an arrangement complicates the ARM API architecture and adds additional processing and storage burdens and other operational inefficiencies to the computer system whose latency characteristics the ARM API is intended to monitor. Moreover, the ARM API can only provide for the measurement of nested transactions that are client-server in nature, i.e., with a parent-child relationship.
Alternative systems and methods for monitoring computer system latency are disclosed in U.S. Pat. Nos. 6,041,352; 6,144,961 and 6,108,700.
U.S. Pat. No. 6,041,352 teaches a response time measuring system similar to conventional ARM. Conventional ARM determines system response time at the point of origin of a transaction request, i.e., when a transaction starts and when it completes from the perspective of the client. The system disclosed in U.S. Pat. No. 6,041,352 differs form conventional ARM in that it determines system response time at any point in the outgoing and incoming transaction path loop.
U.S. Pat. No. 6,144,961 describes a transaction response time measuring system that uses sampling of Open Systems Interconnection (OSI) data packets. In particular, when a user sends a transaction across a network, such as a data request for data stored on a server, data packets containing session layer data (OSI level 5 or greater) will travel across the network between the client and the server. When the transaction is complete and there are no other transactions currently pending between the client and the server, none of the data packets traveling between the client and the server will contain session layer data. In other words, packets containing session layer data only travel between the client and the server while the transaction between the client and the server is pending. U.S. Pat. No. 6,144,961 uses this fact to calculate the transaction response time in a non-intrusive manner.
To determine transaction response times, U.S. Pat. No. 6,144,961 uses a routine which analyzes captured data packets. The system captures data packets and then determines when the transaction in question begins. This is accomplished by detecting the initial presence of a data packet containing session layer data. The session layer data is detected by conventionally using the OSI model's description of the sequence of data information within each packet. Next, the routine detects an absence of session layer data contained within successive captured data packets for a predetermined amount of time. The routine then defines the end of the transaction as the point in time at which the predetermined amount of time began. The amount of time for processing the transaction is then measured as the difference between the beginning and the end of the transaction. Similar to the invention set forth in U.S. Pat. No. 6,041,352 and conventional ARM systems, the system and method provided in U.S. Pat. No. 6,144,961 offers a means to evaluate the response time associated with a particular user transaction request. Accordingly, like those technologies, it does not permit performance evaluation of a computer system comprised of a plurality of cooperating business units and/or processes.
U.S. Pat. No. 6,108,700 discloses a system for measuring the response times of the various stages of computer applications. The invention described therein proposes the creation of a transaction definition language called the ETE (End-to-End) Transaction Definition Language that specifies how to construct identifiable transactions from events and links. In an illustrated example, the ETE Transaction Definition Language provided in U.S. Pat. No. 6,108,700 requires the creation of twenty-one (21) lines of software code merely to define something as relatively simple as a Web commerce transaction. Merely contemplating all of the possible events and transactions that might be involved in a complex business transaction, particularly one whose execution involves the coordination of several business entities and computer systems, is itself a daunting task. Codifying these items complicates the task. That is, individually defining all of these events and transactions in software code in order to produce a complete set of transaction generation rules amounts to a potentially vast amount of preliminary preparation activity that must be performed before the monitoring system may be placed into operation.
An advantage exists, therefore, for a system and method of measuring the precise latency of information flowing through computer systems comprising multiple business units and/or or processes and regardless of system topology, e.g., nested client-server, distributed, or combinations thereof. The technique should be uncomplicated in design and implementation, minimally invasive, and highly scalable in order to accommodate potentially large volumes and frequencies of information flow through vast computer systems and networks.