1. Field of the Invention
The invention relates to diagnosing Transmission Control Protocol (TCP) performance problem situations.
2. Related Art
TCP is one of the most widely used transport protocols. Used both in the Internet and in many intranets, TCP is used for HTTP (Hypertext Transfer Protocol) and CIFTS (Common Internet File System) traffic, as well as NFS (Network File System) data. Although TCP is a robust protocol that provides reliable connection-oriented communication over a wide variety of networks and at a variety of speeds, the observed rate of data transfer may be less than anticipated because (1) either data packet receivers or data packet senders or both may be poorly configured or overloaded, (2) a network or a portion thereof may lack sufficient bandwidth (for example, a network that runs at gigabit rates on the fringes may include a megabit link somewhere in the path between the data sender and the data receiver), (3) multiple data packet losses may occur (due to congestion or other reasons) and require course grained re-transmission timeouts or (4) other related causes.
Since most TCP implementations are not designed for easy debugging of problems, various techniques have been designed and implemented to diagnose TCP-related problems. A first technique involves using some type of packet capture mechanism, such as the Berkeley Packet Filters and manual expert analysis of captured low level packet traces so as to isolate abnormal protocol behavior and trace it to misconfigured or overloaded elements in a network path. Although this technique permits the analysis of specific transmissions, it is relatively inconvenient, costly and error prone. A variant of this technique is embodied in a tool developed by LBL researchers called tcpanaly. Tcpanaly automatically analyzes a TCP implementation""s behavior by inspecting packet traces of TCP activity using packet filter traces. If a trace is found inconsistent with the TCP specification, tcpanaly may provide a diagnosis (if possible) or an indication of what specific activity is aberrant. Similar to other packet driven systems, Tcpanaly does not focus on the general dynamic behavior of a network, but rather on detecting packet filter measurement errors, and other low-level details of TCP algorithms to handle corner conditions while performing congestion control and dealing with various forms of packet loss.
Other techniques, such as commercial packet sniffer systems include logic that analyzes aggregate TCP statistics of the kind reported by the UNIX netstat command; unfortunately, these other similar techniques are generally limited to a broad analysis of the total connections that the system has ever seen. As such, they are not useful means to detect or diagnose a particular defect in a specific connection between two systems.
Accordingly, it would be desirable to provide a method and system for detecting and diagnosing TCP-related problems. This method is achieved in an embodiment of the invention in which an appliance system including auto-diagnosis logic can be coupled to a network and implement auto-diagnostic techniques for TCP.
The invention provides a method and system for detecting and analyzing performance defects in the dynamic operation of the TCP protocol. In a preferred embodiment, this invention comprises auto-diagnosis logic that can either be implemented in a variety of operating systems (such as Data ONTAP) in an appliance-like auto-diagnosis module that is coupled to the TCP receiver, the TCP sender or both.
In a first aspect of the invention, TCP events are sampled and a carefully maintained set of statistics on these events is maintained. The granularity of the sampling and the time period sampled may be adjusted so as to meet the requirements of a particular system. These statistics can be used in the diagnosis of defects on either the sender side or the receiver side, or both.
Receiver side TCP diagnostic techniques include (1) detecting sender""s re-transmission timeouts, (2) evaluating the average size of packets being received, (3) determining that the receiver does not act as a computational or protocol bottleneck, and (4) performing other statistical evaluations of an incoming data stream.
Sender side diagnostic techniques include (1) flagging excessive transmission timeouts, (2) monitoring the average size of a transmitted packet, (3) evaluating if the advertised window is large enough to account for the delay-bandwidth product of the network connecting the receiver and the sender systems, (4) performing various bottleneck checks, and (5) performing other statistical evaluations of an outgoing data stream.
In a second aspect of the invention, the results of the auto-diagnosis are aggregated using a database that includes known attributes of client systems. Examples of attributes include IP subnet number, OS type/version, last configuration change date, delay distance, route information, virtual LAN information historical summary of auto-diagnosis information and other attributes such as may be useful in aggregating auto-diagnosis information. Client systems with common problem areas and common attributes are grouped together for presentation to a system administrator.
In a preferred embodiment, the TCP auto-diagnosis logic can be performed on-line or off-line. Although this auto-diagnosis logic is relatively non-disruptive, for any reasonable implementation, this feature permits performance of system analysis at non-critical times, for example when the overall demand for computing resources may be relatively low.