Traditionally, internet software has been developed in accordance with the principle that very limited functionality is provided in “the network.” Specifically the network provides unreliable forwarding of messages or packets. Following this model, additional functionality, such as reliable delivery and flow control, is implemented entirely at communication end points without adding any functionality to the network.
FIG. 1 depicts a TCP connection between a first node 101 and a second node 102 following this model. The connection extends “directly” from the first node 101 to the second node 102 with only IP routers 103 in between. Routers do not maintain state information associated with constructs that reside above the network layer. Connection state, which is associated with the transport and session layer, is therefore maintained solely in the two connection end points, that is at the first node 101 and second node 102.
As internet technology has evolved, significant functionality has found its way into “the network.” For example, it is increasingly common for a client running a web browser to connect to an intermediate node rather than directly to a web server. Such intermediate nodes include but are not limited to: SOCKS servers, fire walls, VPN (virtual private network) gateways, TCP routers or load balancers, caching proxies and transcoding proxies used with resource-constrained clients such as handheld devices.
The number and types of such intermediate nodes will continue to increase as internet technology evolves to provide additional functionality. The performance of the intermediate nodes will increase in importance as the number of internet clients continues to grow rapidly. This growth will be fueled by large numbers of handheld and pervasive devices that will require intermediate nodes capable of supporting hundreds of thousands to millions of clients simultaneously.
FIG. 2 depicts this increasingly common scenario in which a first node 201 connects to a second node 202 via a set of intermediate nodes 203. Although not explicitly depicted in the figure, intermediate nodes typically communicate with each other as well as with the first node and the second node via routers.
Each intermediate node influences communication between the first node 201 and the second node 202. In many instances, the interface provided by an intermediate node to the first node 201 resembles or is identical to the interface provided by the second node 202 to the intermediate node. Similarly, the interface provided by an intermediate node to the second node 202 is typically similar to the interface provided by the first node 201 to the intermediate node. Because of this similarity of interfaces, the first node may not be able to determine if is connected to an intermediate node or to the second node. Similarly there may be no way for the second node to determine if it is connected to an intermediate node or to the first node. The similarity of interfaces allows there to be any number of intermediate nodes between the first node 201 and the second node 202. This property is depicted in FIG. 3.
A key difference between a router and an intermediate node is that routers perform processing only at layers one through three of the ISO seven layer model, those are the physical, data link and network layers, while intermediate nodes perform processing at and possibly above the fourth or transport layer. FIG. 4 depicts the processing performed by a router in terms of layers. The corresponding diagram for intermediate nodes is shown in FIG. 5.
We define a connection to be a bi-directional communication channel between a pair of connection end points such that information written to one end of the connection can be read from the other end. A connection includes two flows. A flow is a unidirectional communication channel between a pair of flow end points such that information written to the source flow end point can subsequently be read from the corresponding destination flow end point. A connection need not necessarily be supported by a connection oriented protocol. All that is required is the identification of a pair of connection end points and propagation of data between the end points. Connections and flows reside at the fifth or session layer in the ISO model. Because routers in general do not perform processing above layer three, routers generally do not perform processing explicitly related to connections. Intermediate nodes however do generally perform processing explicitly associated with connections.
In fact, intermediate nodes can be distinguished from routers in terms of connections. In FIG. 1, the first node 101 communicates with the second node 102 via a single connection 104 whereas in FIG. 2 communication between the two nodes takes place via multiple connections in series 204, 205, 206, 207, 208 and 209. The use of multiple connections provides each intermediate node with end points that can be used to influence communication between the first node 201 and the second node 202. However, intermediate nodes typically expend much of their resources simply moving data from one connection to another. In one common scenario, the intermediate node monitors the flow of information between the first node 201 and second node 202 only until a request made by the first node 201 can be identified. In another common scenario, the intermediate node monitors the flow of information only in one direction. The performance and capacity of an intermediate node is often determined therefore by the efficiency with which it moves data between connections.
Having a series of connections between the first and second node can cause several undesirable side-effects. Connections in series tends to deliver worse performance in terms of latency compared to a single connection and may also degrade throughput. Each node in a packet switched network introduces some delay and imposes a throughput limit. The performance of a packet switched network therefore relies on minimizing the delay introduced and maximizing the throughput supported by each node. This is accomplished, in part, by performing only minimal processing at each node. The packet forwarding performed by a router entails only a small amount of overhead, but the processing associated with connection end points performed by an intermediate node is significant.
The presence of multiple connections also alters the semantics of communication between the first node 201 and the second node 202 of FIG. 2. For example, with a single connection, the first node 201 is assured data has arrived at the second node 202 when it receives an acknowledgment. With multiple connections, the first node 201 may be led to believe data has arrived at the second node 202 when, in fact, it has only reached the first intermediate node.