A computer communications network is a composite discrete event system (DES) made up of two classes of servers: links, which effect the actual transportation of data between source and destination end nodes; and intermediate nodes, which relay data between links, thereby effecting the concatenation of links to provide end-to-end transport of data. Other terms of art for an intermediate node include intermediate system and relay. This concatenation, which is generally referred to as routing or forwarding, may be static or dynamic. Static routing relies on pre-computed routing or forwarding tables that do not change even if there are changes in the state of the network's available links or intermediate nodes. Dynamic routing, in contrast, alters the routing or forwarding tables as feedback is received about changes in the state or topology of the network, possibly including information on the maximum available bandwidth or service rate of the network's links. When a network has been designed with redundant links, a principal advantage of dynamic routing is that it allows recovery from faults that might otherwise disable end-to-end transport of data.
The operation of dynamic routing is typically based on the exchange of routing tables and/or topology state-information among the intermediate nodes. The mechanism used to exchange this information is generally called a routing protocol. A routing protocol enables each intermediate node to construct a topology model, either implicit or explicit, of the network. When the routing protocol exchanges routing tables the topology model is implicit. With topology state routing, on the other hand, each intermediate node constructs, using the topology state information received from other intermediate nodes via the routing protocol, an explicit model of the current topology of the network. With either approach, it is through this exchange of feedback that each intermediate node synchronizes its topology state model, implicit or explicit, with its peers, i.e., the other intermediate nodes. The synchronization attempts to prevent concatenation faults such as forwarding loops by ensuring that each intermediate node has the same topology model.
Using its topology model, each intermediate node then calculates, typically using some form of dynamic programming optimization, the shortest paths to all the other intermediate nodes. The result of this calculation is a shortest path first (SPF) spanning tree from which each intermediate node derives its forwarding or routing table. Because each intermediate node's topology model is the same as its peers, the shortest path first (SPF) spanning trees, and hence the respective forwarding tables thus calculated, are consistent. A consequence, however, of employing spanning tree algorithms is that generally, in a network where the number of intermediate nodes is N and the number of links is L, only N−1 out of the L links are used to carry traffic. Any other links in the network are idled, and used only in the event that one of the N−1 links selected to carry traffic suffers a fault that causes the SPF spanning tree to be recalculated. A direct consequence of the use of a spanning tree to determine the forwarding or routing tables is that highly robust networks, with many redundant links, are necessarily highly inefficient; and that highly efficient networks, with few if any unused links, are necessarily fragile and subject to interruption of service by faults.
By design, topology-adaptive routing protocols only maintain connectivity, leaving routing tables unchanged unless there is a fault or other change that affects the bandwidth or service rate of a component intermediate node or link. In particular, topology-adaptive routing protocols will continue routing traffic through congested, heavily loaded parts of the network, even if idle links are available, so long as the bandwidth of the links in the SPF spanning tree remain unchanged. In fact, topology adaptive routing protocols that rely on implicit or explicit spanning trees of necessity create bottlenecks. This may be acceptable as long as computer networks transport only time-insensitive traffic such as bulk file transfers, email or most web-pages. However, computer networks are increasingly carrying large amounts of time sensitive traffic such as voice or video, both of which require highly consistent and predictable service from the network. Widely varying delays or latencies (also known as jitter) militate against the transportation over packet-switched networks of multimedia and other jitter-intolerant traffic.
The challenge of controlling performance variables such as response time, jitter, and throughput—generally referred to collectively as Quality of Service (QoS)—is that knowledge of topology alone is inadequate. As is well-known from queueing theory, bandwidth or service rate is just one of several variables that determine the performance of a discrete event system (DES), i.e., queueing system. Other variables include the traffic arrival rate(s), the storage available for queued traffic, and the scheduling discipline(s) used to determine the sequencing for execution of incoming requests for service, i.e., the forwarding of packets. Topology-adaptive routing protocols deliberately ignore these other variables when calculating the forwarding tables to be used by the intermediate nodes in a computer network. The forwarding tables that are calculated as part of the routing process are optimized with respect to topology and/or bandwidth but no attempt is made to include traffic state information or similar data about congestion in the network.
Part of the reason why routing protocols in use today do not include traffic or queueing state information is because of the negative experiences that occurred with several early routing protocols that attempted to do so by enhancing the state information exchanged by intermediate nodes with traffic-related data, most notably the ARPANET's Gateway to Gateway Protocol (GGP). The GGP used as the optimality metric in its routing process the estimated delay as measured by the queue length on outgoing links, which each intermediate node included in its routing updates. But the inclusion of such traffic state information in the routing caused major difficulties with instabilities due to rapidly changing state information generated by volatile queue dynamics. The result was large oscillations in routing as attempts to route traffic away from regions of congestion toward less utilized links suffered from overshoot, resulting in traffic bouncing between regions of the network as each region alternatively went from idle to congested and back again. Because of this phenomenon, subsequent routing protocols have avoided using dynamic traffic or queueing state information in their routing.
Another of the disadvantages of the prior art traffic-enhanced routing protocol approach has been the very substantial amounts of the network's total bandwidth consumed by the routing protocol's state updates, the frequency of which is determined by the volatility of the state information they are carrying. The frequency of topology changes (due to links and/or intermediate nodes suffering faults) is many orders of magnitude less than the frequency with which a link's queue changes size. Exacerbating this is the fact that, as is well-known from estimation theory, to adequately represent the dynamics of a varying signal requires sampling at twice the rate of the highest frequency component. Because queue sizes change with each arrival and/or each departure, the frequency of change can be arbitrarily high. Adequate sampling rates to capture the queueing dynamics generate vastly larger amounts of routing protocol update traffic than purely topology-adaptive routing protocols.
Because of these disadvantages, attempts to generate traffic-adaptive paths are today largely limited to manual efforts, where human operators or network engineers estimate traffic patterns and lay out routes that will avoid congested regions. It is widely recognized that this approach is neither accurate nor scalable: large networks will require tens of thousands of paths active at any given time, and generating these is simply beyond the capabilities of current or future network staffing.
Accordingly, a need exists for a method and system for automatically, without manual effort, controlling quality of service parameters including response time, jitter, throughput, and utilization and which is independent of topology. A further need exists for a method and system which meets the above need and which automatically, without manual effort, generates paths that minimize delay by avoiding congestion to the greatest extent possible without incurring unstable routing dynamics, especially large oscillations in routing. A further need exists for a method and system which meets the above need which is independent of the mix of traffic and protocols used in the computer communications network. A further need exists for a method and system which meets the above need without requiring modification of the hardware and software in the intermediate nodes in computer networks. A further need exists for a method and system which meets the above need without requiring proprietary protocols. A further need exists for a method and system which meets the above need without consuming excessive amounts of the network's bandwidth. A further need exists for a method and system which meets the above need without excessive computation and is therefore tractable to real-time, on-line optimization. A further need exists for a method and system which meets the above need and which utilizes a large percentage of the links in the network. A further need exists for a method and system which meets the above need and which can be used by content-caching applications to determine the optimal locations for content-caches to which web or similar requests can be redirected. A further need exists for a method and system which meets the above need and which can be used to provide input on traffic and utilization patterns and trends to capacity planning tools. A further need exists for a method and system which meets the above need and which can be used to identify links and/or intermediate nodes of a computer communications network that at certain times have either a deficit or surplus of bandwidth to a bandwidth trading tool which will either buy additional bandwidth or make available the surplus capacity for resale to carry third party traffic.