1. Field of the Invention
The present invention relates to the field of computer systems. More specifically, the present invention relates to tracing and analysis of network packets in a computer network.
2. Art Background
The ability to trace network packets in a distributed computer system network is very important. It is also important to be able to analyze the resulting network packet traces. Such analysis can be used for evaluating the performance of the network, or to debug applications running on various nodes in the network. The analysis can also be used to examine the correctness of a protocol when a new protocol is being developed.
It is not particularly difficult to capture and analyze network packets in a distributed network connected using a shared media network, e.g., an Ethernet network. A packet analyzing computer can be connected to the shared medium and can monitor the network traffic through the shared medium. Because all of the network traffic is carried by the shared medium, the packet analyzer can be set to collect information regarding the network traffic, or to collect information regarding a portion of the network traffic that is of interest.
There is no easy way, however, to capture and analyze the network packets in a distributed system connected through a non-shared media network, e.g., an Asynchronous Transfer Mode (ATM) network. A typical non-shared media network is configured using a "hub and spoke" topology wherein a central switch at the "hub" is used to establish separate point-to-point connections between individual pairs of nodes, with each node being located on a different "spoke". More complex networks are then formed by connecting together the central switches of two or more hubs. In this way, a point-to-point connection between nodes belonging to two different hubs can be established.
There are several possible ways to capture and analyze the network packets in a distributed system connected through a non-shared media network, but none of the current approaches is satisfactory.
One approach is to modify the operating system on one or more nodes so that the operating system will collect the data. This approach is difficult because the network may contain nodes that are using different operating systems. Each operating system has its own internal interface and structure that would have to be modified in an ad hoc fashion specific to that particular operating system. Furthermore, if one were to modify the operating system for a particular node, then one could only observe the traffic that is directed to that particular node, or traffic that originates from that particular node. Typically, however, one desires to collect information about traffic beyond the traffic of a single node. Under this approach, one would have to independently modify the operating system of each node of interest, get the data collected at each node and then merge the data manually to get a picture of what happened in real time. Besides being difficult to implement and maintain, this approach may not be practical because nodes of interest may be separated by miles, or even located on different continents.
Another approach is to modify one or more switch by adding some software or hardware, or both, so that the switch will intercept and collect packets. This solution is also not practical because switches are manufactured by different vendors and are hard to modify. Typically, each vendor has its own proprietary software, and does not provide software hooks that would be necessary to gain access to the desired data. Furthermore, a network can contain multiple switches manufactured by multiple vendors, with each switch of interest requiring modification. Moreover, when multiple switches are involved in a huge network, one must collect from multiple switches and then splice the collected information together in order to form a picture of what is going on inside of the network.
A final approach is that one can somehow splice or tap into each wire of interest. This approach also is practically infeasible. Furthermore, to study traffic of more than one node, multiple taps are required. Moreover, when multiple taps are involved, one must collect data from the multiple taps and then splice the collected information together for analysis.