The present invention relates to a method for the management of inter-domain traffic over the Internet. In particular, the present invention relates to methods for selecting the most efficient and uncongested paths for inter-domain traffic.
A major challenge for inter-domain traffic engineering is the level of uncertainty an Internet Service Provider (ISP) is facing when selecting paths for Internet traffic. With many potentially competing ISPs, global coordination of Internet traffic to select the best path and minimize congestion can be difficult. Moreover, a single ISP only has limited knowledge about user demand, available network resources, and routing policies at the peers.
One of the major problems encountered in inter-domain traffic engineering is appropriately reacting to congestion, losses, and/or delays experienced by traffic between a home network and other domains. This problem is encountered in a home network with multiple links into another domain, such as large ISPs with multiple peering links to another backbone, or a smaller local network multi-homed to two or more backbones. In such cases, the company paying for these links may find that, under normal policies, traffic exiting the home network through one of the links is experiencing congestion, while other links are under-utilized. In some cases, the congestion experienced by traffic leaving or entering the home network may actually occur outside its domain. In order to improve the performance experienced by the connections generating the associated traffic, the ISP operator needs to find a way to alleviate the congestion, losses and/or delays.
In simple cases, where the congestion is on the home network's egress link, congestion may be detected via Simple Network Management Protocol (SNMP) traffic measurements. SNMP is a set of protocols for managing complex networks which was first used in the 1980s. SNMP works by sending messages, called protocol data units (PDUs), to different parts of a network. SNMP-compliant devices, called agents, store data about themselves in Management Information Bases (MIBs) and return this data to the SNMP requesters. A user can determine an alternate route for the traffic by using SNMP to measure the available capacity on the alternate egress links. Under such circumstances, there are any number of approaches to the problem of load balancing the traffic with the only real problem being the way it is implemented because typical inter-domain routing using Border Gateway Protocol (BGP) is a relatively coarse mechanism for traffic engineering.
A variety of metrics can also be used when a user is trying to determine the best path to use for sending traffic over a network. Some routing protocols, such as Routing Information Protocol (RIP), use only one metric and that is hop count. And some routing protocols, such as Interior Gateway Routing Protocol (IGRP), use a combination of metrics. The metrics most commonly used are: (1) hop count—the number of routers that a packet must go through to reach its destination; (2) bandwidth—the data capacity of a link; (3) delay—the length of time to move the packet from the source to destination; (4) load—the amount of activity on a network resource; (5) reliability—the error rate of each network link; (6) ticks—the delay on a data link using IBM PC clock ticks; and (7) cost—an arbitrary value assigned by an administrator. The best route depends on the metrics and metric weightings used to make the calculation. For example, one routing protocol might use the number of hops and the delay, but might weigh the delay more heavily in the calculation. Thus, a route having more hops and shorter delays may be less expensive than a route having fewer drops and longer delays. Paths that are expensive to use are usually avoided. Such metrics are a useful tool but they cannot accurately account for traffic in other domains.
Leaving aside the practical problems of implementing a method for balancing the traffic load, there is a more fundamental problem. In some cases, the congestion experienced by traffic leaving or entering the home network may actually occur outside the domain so that the operator is unable to make direct observations of the traffic along different paths. Indeed, the operator's knowledge is limited to the home network and he or she does not necessarily know the routing, topology, or capacity of the network that the traffic traverses. However, the operator can often still detect congestion in the network using active probes, and the operator may want to take remedial actions to alleviate or avoid this congestion.
If the operator chooses to alleviate or avoid congestion that is detected on the network, he or she must balance loads across multiple paths without knowledge of the available capacity and the details of the paths themselves. The only information available to the operator are coarse measurements of statistics (such as loss rate) and Round Trip Time (RTT), which can be used to infer congestion. In addition, the operator may be able to use some of the more recently developed techniques to estimate bottleneck bandwidth, for example, an approach based on network tomography. These mechanisms are unlikely to give an entirely accurate picture of the paths in question, but can narrow down the range of possibilities.
A further, and even more difficult problem, is that multiple operators may decide to use similar approaches to avoid congestion and may be simultaneously rerouting their traffic. In this case, they will play a kind of “game” against each other to optimize the utilization of the external domain, not knowing that other operators are changing traffic paths. Based on past experience, it has been found that such “games” (where individuals optimize their own utility) do not always lead to the global optimal behavior.
Accordingly, there is a need for a method which allows the network operator to ease congestion and to balance the loads on the network paths. More specifically, there is a need for a method which allows the operator to estimate the amount of traffic across different paths in a network on a fairly accurate basis and to reroute some of the traffic to uncongested network paths.