Modern computer networks can include hundreds or thousands of computers connected in networks or tiers. These networks can be, in turn, connected together by larger networks such as the Internet so that systems of many tiers are created.
The networks communicate through frames or packets of data arranged to transfer information in various protocols. The protocols can include, for example, TCP/IP or HTTP. Enterprise applications on the networks communicate through messages broken down into frames. Usually it requires many frames to communicate messages between the computers and tiers of the network system.
“Enterprise applications” are programs displayed on the computers to accomplish various tasks. They are characterized by multiple components deployed across multiple network tiers accessed by users across the entire network system. Parts of a program can be distributed among several tiers, with each part located in a different computer in a network. Examples of enterprise applications include Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Supply Chain Management (SCM), and Online Banking, Brokerage, Insurance and Retailing.
Enterprise applications typically provide a variety of business functions that users may execute. For example, an online stock trading application may provide some of the following business functions: “log in”, “display account status”, “retrieve stock prospectus”, “sell stock”, “buy stock”, and “log out”. When a user executes a business function, a sequence of transactions is performed with each transaction consisting of a source component transmitting a request (via a network message) to a destination component, often on another tier, and perhaps waiting for a reply message. The destination component processes the request and in the processing consumes local (server) resources such as cpu, disk input/output, and memory and may generate subsequent requests (subtransactions) to other components.
The time that elapses between the user executing the business function (submitting his or her request) and the display of the results on the user's workstation is called the end user response time. The end user response time is typically the most critical measure of end user satisfaction with network and application performance. If the response times are too long, end users will be unsatisfied.
In order to maintain and improve performance, application and system managers must monitor the performance of the network system for response times in order to understand the current performance of applications and components, be able to identify and predict current and future performance problems, and evaluate potential solutions to those problems. Typical problems include data “bottlenecks” such as firewalls and routers and system “delays” caused by mechanical access to data by a disk drive.
The most common method to monitor performance of the system is to capture and analyze network data that is transferred across the tiers via frames. For example, to analyze the performance of the system in relation to requests from a work station, the requests and replies are tracked across the system. To track the requests and replies, data frames are captured and arranged in chronological order to determine how the messages between computers are flowing. The message flow often allows a determination of system performance in relation to response times.
Data frames are captured by computers connected to the network which monitors network traffic with “sniffer” programs. The sniffer programs receive and store copies of data frames in one or more files. During storage, the network sniffer adds data to the frame which indicates the time that the frame was received relative to the sniffer. The added data is known as a “time stamp”.
Network system topology often makes it impossible to track message flow for an entire network system from a single network sniffer. To track message flow, frames stored by multiple sniffers must be collected and arranged in chronological order. Even so, the interpretation or analysis of the collected frames from the multiple sniffers can be difficult unless merged into a single file.
Merging files from different sniffers is difficult due to the inaccuracy of their clocks. In the prior art, the clocks from each sniffer are unstable and unsynchronized. Typically in capture devices clocks are low priority programs that “flutter” or “jitter”. “Flutter” and “jitter” can cause inaccuracy in clock times of up to 10-40 ms per second depending on the clock program and hardware. Therefore, during the data collection period, slight variations in each capture device's clock can occur. Moreover, the clocks on each sniffer are typically independent and unsynchronized. Because the clocks are not synchronized, the times stamps generated by the various sniffers are not synchronized. If the timestamps are off by even a few milliseconds, the chronologically arranged frames from various sniffers will not be in the right order and so will not give an accurate representation of a single capture file for the entire system making analysis extremely difficult.
Traditionally, the steps for merging the data from the sniffers into a single file have been performed manually. A common method to overcome the lack of synchronization is to manually calculate or estimate the difference between duplicate timestamps and apply a single time adjustment to all frames in the final merged file. One problem with the prior art methods for correcting the inaccuracy of timestamps lies in the application of the calculated difference. This manual calculation is performed once and applied to all the timestamps of the collected frames. As a result, inadvertent or unavoidable changes in the relative difference between the timestamps during data collection can go undetected. Other problems include the tendency of the prior art methods to be both error prone and time consuming.
The use of multiple sniffers in order to track message flow from across a network system creates yet another problem. Namely, the same data frame often traverses a single network to which more than one sniffer is attached. Since each network sniffer receives and stores each data frame, the result is duplicate frames stored by various network sniffers. Before analysis, at least one of each of the duplicates must be removed. In the prior art, the duplicates are identified and removed by hand, creating additional errors.
What is needed is a method wherein the merge of collected data is performed automatically, with no manual intervention. The method should provide for an automatic calculation and adjustment of the difference in timestamps and recalculation of the difference as often as possible. The method should also provide a way to recognize and remove duplicate frames from the final merged file.