This invention relates to the field of computer applications, and in particular to distributed applications that include communications among processors on a network.
With advances in networking technology, distributed applications continue to grow in popularity, and in complexity. In a typical distributed application, a client device may initiate the application, and the application may execute a request for data services at a remote server, and this remote server may in turn request data or other processing from other remote servers. The executed processes at the servers may be specific components of the application residing at the servers, or they may be components provided by the servers and accessed by the application.
Users of an application are generally sensitive to performance and reliability issues associated with the application, and in a competitive market, will generally avoid slow or unreliable applications. Application developers are also sensitive to these issues, to assure that their developed product remains competitive. In like manner, service providers are also sensitive to these issues, to assure that their provided service is not the cause of performance and reliability problems that may affect their customers.
Tools are available for assessing network traffic performance, as are tools to assess processing performance. However, from a network performance viewpoint, details are provided regarding such factors as latency delay, transmission delay, queuing delay, and so on, but the expenditure of time at a processing element is merely viewed as ‘processing time’, or ‘non-network time’. In like manner, from a processing viewpoint, details are provided regarding the time spent performing each of a variety of functions, but the time consumed waiting for responses to requests and the like are merely viewed as ‘communication time’.
A typical scenario for an application developer is to test/evaluate the application's overall performance by detecting when particular events occur, and from that information, determine the consumed time between events. Based on the consumed time between events, the developer attempts to optimize the performance of the application. However, because the processing of the application is distributed, the application developer must attempt to collect the data at each processor, as well as across the various communication links among the processors. This task, if feasible, is complicated by the fact that the application is generally run in an operational environment, and distinguishing events from one application from among events of other applications is often difficult.
Typically, the most comprehensive tool available to an application developer for network performance monitoring is a network trace system, such as the ACE system from OPNET Technologies, Inc., of Bethesda, Md., that captures data transmissions associated with an application across a network, and presents the information as a data exchange diagram, or as a Gantt chart, that illustrates the time spent communicating the application messages between nodes on the network, as well as the time spent at each node. These visualizations of the data communications illustrate the time spent at each node, but do not provide any insight as to the activities that consume the time at each node. To analyze performance at the nodes of a network, a system analysis tool, such as Panorama from OPNET Technologies, can be used to determine which processes are consuming the most time, and/or to identify anomalies in performance as an application is executed at the node. This conventional segregation of analysis tasks, between network analysis and system analysis, is poorly suited for analyzing the performance of distributed applications that increasingly rely upon a proper balance of network and system capabilities and interactions.
It is an objective of this invention to provide a method and system for capturing application-related events across a network, as well as within nodes/processors of the network. It is a further object of this invention to provide a method and system for analyzing captured network and processor/system events and to produce an integrated view of the delays incurred as application-related messages are communicated and processed among the distributed nodes/processors of a network.
These objectives, and others, are achieved by a method and system that include a first capture system that captures communication events related to an application, and a second capture system that captures processing events related to the application. A visualization system analyzes the data captured by each of the capture systems, synchronizes and correlates the data, and presents an integrated display of these communication and processing events. In a preferred embodiment, the communicated messages include an identifier of the application, and the processing components also associate an identifier of the application to each recorded processing event. To facilitate an integrated display of the events, the visualization system synchronizes the recorded communication and processing events to a common time base.
Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions. The drawings are included for illustrative purposes and are not intended to limit the scope of the invention.