In IP networks, resource management protocols on the data path have been investigated in recent years to ensure quality of service (QoS). Such protocols are responsible for ensuring that resource needs are met for data flows arriving at the edge of a network domain or autonomous system, and to ensure that the interior nodes of the domain are provided with information regarding the future path of the flow. This enables the interior nodes to make a local admission control decision. A flow is usually admitted into a network domain only if all interior nodes in the path have admitted it. A flow is admitted end-to-end only if all intermediate domains have made a positive admission decision. The admission of a flow also requires the reservation of resources in all interior nodes (except for pure measurement based admission control).
Integrated Services (IntServ) is one architecture adopted to ensure QoS for real-time and non real-time traffic in the Internet. The Internet Engineering Task Force (IETF) standardization organization has specified the Resource ReSerVation Protocol (RSVP) for reserving resources in IP routers, as specified in RFC 2205. Each router along the data path stores “per flow” reservation states. The reservation states are “soft” states, which have to be refreshed by sending periodic refresh messages. If a reserved state is not refreshed, the state and the corresponding resources are removed after a time-out period. Reservations can also be removed by explicit tear down messages. RSVP messages always follow the data path, and so RSVP can operate alongside standard routing protocols. If traffic is re-routed, refresh messages make reservations in the new data path.
In large networks the number of flows, and therefore the number of reservation states, is high. This can lead to problems storing and maintaining per-flow states in each router. Another architecture, Differentiated Services (DiffServ), has therefore been proposed to provide QoS in large-scale networks, and this is described in RFC 2475. In the DiffServ architecture, services are offered on an aggregate, rather than per-flow basis, in order to allow scaling up to larger networks. As much of the per-flow state as possible is forced to the edges of the network, and different services are offered for these aggregates in routers.
The service differentiation is achieved using the Differentiated Services (DS) field in the IP header. Packets are classified into Per-Hop Behaviour (PHB) groups at the edge nodes of the DiffServ network. Packets are handled in DiffServ routers according to the PHB indicated by the DS field in the message header. The DiffServ architecture does not provide any means for devices outside the domain to dynamically reserve resources or receive indications of network resource availability. In practice, service providers rely on subscription-time Service Level Agreements (SLAs) that statically define the parameters of the traffic that will be accepted from a customer.
The IETF Next Steps In Signaling (NSIS) Working Group is currently working on a protocol to meet new signalling requirements of today's IP networks. The QoS signaling application protocol of NSIS is fundamentally similar to RSVP, but has several new features, one of which is the support of different QoS Models. One of the QoS models under specification is Resource Management in DiffServ (RMD). RMD defines scalable admission and congestion control methods for DiffServ networks, so that interior nodes inside a domain possess aggregated states rather than per-flow state information. For example, interior nodes may know the aggregated reserved bandwidth, rather than each flow's individual reservation. RMD also uses soft states (as with RSVP), and explicit release of resources is also possible.
The “stateless” domain property means that, in the domain, the interior nodes do not maintain per-flow state information, only aggregated states (e.g., per-class). However, even in stateless domains, the ingress and egress edges are stateful nodes. In RMD, an end-to-end reservation is divided into “per-domain” reservation (between stateful edge nodes) and “per-hop” reservation (local reservation inside the domain).
All practical resource reservation protocols (RSVP/NSIS/RMD) rely on the routing protocols to assign a path for the incoming flow. The protocol message is routed along the same path as will be used by the regular user packets after a positive admission decision, which is why the reservation along this path is valid. However, the opposite relationship is generally not true: internal routing protocols (e.g. Open Shortest Path first (OSPF), Intermediate System to Intermediate System (IS-IS) or Border Gateway Protocol (BGP)) do not rely on the reservation protocol when determining the paths.
Thus, when an existing link or node fails, the routing protocols calculate a new path based on their own optimization criteria and own metrics (e.g. choosing the lowest cost path) without recourse to the reservation protocols. As a result, traffic may easily be re-routed to a path that is already occupied (i.e., where there was no reservation for the re-routed flows) leading to severe congestion.
FIG. 1 illustrates how congestion may occur when a flow is re-routed. An exemplary IP DiffServ domain 10 has two ingress edge nodes 1, 2, one egress edge node 3, and two core routers 4, 5. One data flow 6 passes from one of the ingress nodes 1, via one of the routers 4, to the egress node 3. Another data flow passes from the other ingress node 2, via the other router 5, to the egress node 3. If the second router 5 fails, the second data flow 7 is re-routed via the first router 4. Since the first router 4 is already handling the first data flow 6, this causes congestion in that router.
The resource management protocol must quickly remove the potential congestion and re-establish the reservations of the re-routed flows on the new path. Where all interior nodes include per-flow reservation states (as in RSVP), the interior node that re-routes the traffic can re-initialize the reservations of the re-routed flows quickly after the path change. In the case of RSVP, this is called Local Repair. Local Repair starts in the node that re-routes the traffic: this node—utilising its per-flow database—sends out reservation messages for all re-routed flows onto the new path, immediately trying to reserve their bandwidth. If the reservation is not successful, the excess flows are terminated.
However, with a reduced-state protocol (e.g. RMD), the solution for these congestion situations is non-trivial, since session data is only available at the edge nodes of the domain. Interior nodes can report the overload to the egress edge nodes using packet marking, but cannot re-establish the reservation states. When egress nodes receive information describing the congestion, they must inform their stateful ingress peers by sending notification messages upstream. The ingress nodes can then choose from alternative solutions to solve the situation, since they are the ultimate sources of QoS guarantees in the domain. Possible solutions could include pre-configured combinations of methods such as admission denial for new flows, or deletion of QoS guarantees for congested or low priority flows. The signalling procedures required to put this into operation are shown in FIG. 2.
All congestion handling approaches in stateless QoS domains must transmit two pieces of information to egress:                1. The identification of flows, to any given egress node, which are affected by the congestion. This enables egress to make decisions about the flows for which the reservations can be torn down.        2. The metric of congestion (i.e. the excess bandwidth that cannot be supported long term).        
In current specifications, the first requirement (identifying affected flows) is solved by ensuring that all data packets that pass a congested core node are marked with a pre-allocated packet classification field in the header of the packet, known as a Differentiated Service Code Point (DSCP). The DSCP used in this case is called AFFECTED DSCP. Where an egress node receives packets for its flows through many core nodes, not all of which are congested, the AFFECTED DSCP enables the identification of the flows which add to the overload, using per-flow Type of Service (TOS) field sensitive flags.
One method to ensure that the second requirement (conveying the congestion metric to the egress nodes) is met is to re-mark data packets with an indicator of the excess bandwidth value. This can be done using another pre-allocated DSCP called ENCODED DSCP. This encoding process might be a byte-to-byte correspondence between excess bandwidth and marked bandwidth, or there might be a pre-configured domain-wide multiplier, so that one ENCODED DSCP-marked packet received at an egress node identifies more overload bandwidth than its actual packet size.
Thus packets marked with both AFFECTED and ENCODED DSCPs can be distinguished from other traffic, and can be measured at egress nodes. When multiple congestions are solved using this model, every core node must measure enter the rate of data entering the node, together with the rate of data marked by the AFFECTED DSCP entering the node, on a per-egress interface basis. By these measurements, it is possible to re-mark packets with the appropriate overload level, no matter if the current core node drops packages or not.
There are a number of problems with the method described above for solving congestion.
One problem relates to the processing overhead required for per-flow congestion notifications. Using the method described, egress nodes instruct their ingress peers to delete reservations for individual flows on a per-flow basis. If the number of these flows is large (e.g. several hundreds), then the bursts of API calls between the signalling and transport layers can cause delay in upstream congestion advertising. Furthermore, when these messages reach their destination ingress node, they generate processing overhead. The signalling messages load the already congested network with extra traffic, and might be dropped if they do not have dedicated bandwidth. This drop results in an inaccurate solution.
Another problem relates to the transmission of a congestion metric based on the re-marking of the TOS field. The concept is accurate, but demanding if used generally to support multiple congested core nodes on the downstream path. Two extra DSCP-s are required for each PHB classification, to distinguish between packets passing the congested core node (which are re-marked with AFFECTED DSCP), and packets used in the metric encoding process (re-marked with ENCODED DSCP). Thus ten or more extra DSCP-s may be required, which will be used only for congestion handling in the stateless domain.
The method described above provides for signalling a high volume of overload (that is higher than the capacity) quickly, by specifying a multiplier, so that each received byte in marked packets identifies more than one byte of congested bandwidth. However, this multiplier must be configured domain-wide, and real-time consistent changes in all nodes might be a problem. Furthermore, congestion handling granularity drops as the multiplier is increased. Moreover, because marked packets are given the same or worse QoS guarantees than normal data-packets, marked packets may also be dropped, and due to the effect of the multiplier, each lost packet means that [packet size]×[multiplier] bytes of congestion bandwidth stays unhandled.
In addition, the egress nodes must measure the marked bytes, which means that they also have to measure the size of every passing packet. This method has its implementation difficulties as well, and the result depends on the correctness of the measurements.
Another problem relates to the transmission of the congestion metric based on refreshing reserve messages. Refreshing reserve messages are sent between stateful ingress and egress peers in an RMD domain, in an edge-to-edge manner, so transmission of this message is transparent to the core routers. However, as specified in the NSIS RMD draft, intra-domain core nodes are also signalled for, though with higher frequency. Numbers suggested by the draft are 30 seconds for edge-to-edge refreshes, and about 10 seconds for intra-domain refresh messages.
These refresh-times are domain-wide, pre-configured values. This can lead to problems in a link where admitted flow count (flows which have active reservations on their signaling path) is low (e.g. one or two) and a congestion event takes place when the re-routed flow count is also low. In this situation, refresh messages are only able to transmit the congestion metric in tens of seconds, which results in large congestion handling times.
It would therefore be desirable to solve congestion events faster than is possible using the systems described above.