The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Computer networks comprising end stations and infrastructure elements, such as routers, switches, and bridges, benefit from management and monitoring. In a packet-switched network that uses Internet Protocol (IP) and Transmission Control Protocol (TCP), an essential management function is monitoring the amount of time required to transmit data from one point in the network to another point in the network. The time required for a data packet to move from an origination point to a destination point, and to receive a response packet from an application at the destination point, is sometimes termed “round-trip time” (RTT). Delays that packets encounter in traversing such a path are sometimes termed “latency.”
Computer application software programs (“applications”) often use TCP as a transport protocol because TCP provides reliable and ordered delivery of messages. TCP is defined in documents published by the Internet Engineering Task Force, including, for example, IETF Request for Comments (RFC) 793 (1981). The reliability of TCP is favored in financial systems; for example, applications providing order execution, status reporting, and market data are commonly based upon TCP.
The network administrators at financial service providers, such as investment banks and brokerages, desire to have information indicating the performance of the networks of the financial service providers. Such networks are sometimes owned and operated by network service providers who enter into a service level agreement with the financial service providers. The service level agreement specifies minimum expected performance metrics that the network service provider is expected to meet.
In this context, the network administrators of the financial service providers are interested in having network performance data that can be used to verify whether the performance metrics are being met in actual network operation.
IETF RFC 1323 (1992) defines a “timestamp option” for TCP. When a network element uses the timestamp option, that network element inserts a timestamp value into each TCP segment that the network element sends to another TCP endpoint. The timestamp value represents an approximate time at which the TCP segment was formed or dispatched.
The TCP timestamp option is intended to provide a way for TCP endpoints to monitor the amount of time that a packet takes to traverse from one endpoint to another endpoint for the purpose of adjusting back-off values or other parameters. In practice, the TCP timestamp option is inaccurate, because the clocks of the TCP endpoints are almost never synchronized or are subject to clock drift and other errors. While clock synchronization protocols exist, a particular TCP endpoint seeking to connect to another endpoint is never assured that the other endpoint is using the synchronization protocol. Clock synchronization protocols also add network traffic overhead and are not widely deployed.
Further, even if a sending TCP endpoint is configured to use the timestamp option, the receiving TCP endpoint may be configured not to use the timestamp option. Therefore, in practice the TCP timestamp option does not provide a reliable mechanism to measure end-to-end latency. An improved method to measure end-to-end latency is needed.
Still another problem with existing approaches is that they do not provide a way to determine latency or delay at the application or socket level. In past practice the TCP timestamp option has been monitored only within a TCP module. Such monitoring ignores the impact of latency from moving packets from the TCP module up the network stack to an application, application processing delays, and latency moving packets back down from the application to the TCP stack. The error is doubled because such an impact is ignored in both endpoints of a connection. Thus, mere use of TCP timestamps in conventional practice in a TCP layer does not give an accurate view of all latency that an application experienced. IETF RFC 2564 provides an application management information base (MIB) for network elements that use simple network management protocol (SNMP), but RFC 2564 does not provide a way to measure end-to-end delay.
Prior practice also has not allowed for rapid or real-time analysis of latency issues. TCP timestamps could be written to a log and subjected to later log analysis, but such analysis usually occurs far too late to address network problems that caused the latency. An approach is needed that allows a network administrator to detect high latency and address network problems at about the same time as latency issues are detected.
Network infrastructure elements such as routers and switches are sometimes operated in a so-called “promiscuous mode” in which a first network element is logically interposed between and monitors traffic originating from a second network element and directed toward a third network element. In promiscuous mode, the first network element examines packets that move from the second network element to the third network element, but the first network element does not delay or modify the packets. The first network element can report a copy of the observed traffic, data signatures of observed packets, or control plane events communicated between the second and third network elements, to a management station or a monitoring entity. By monitoring packets in both directions, the first network element can report bidirectional latency.