The present invention relates to the field of data communication networks.
The following paragraphs give definitions of terms relevant to this document:
Physical Link: A single point-to-point serial transmission link between two nodes in a network (such as between two routers or between a router and a host machine).
Physical Output Port: The output port of a switch, such as a router that supports at least one physical link.
Logical Link: A point-to-point traffic path between two switches that is composed of multiple parallel physical links and appears from a routing point of view to be one link.
Logical Output Port: The collection of physical output ports that support the physical links of a logical link.
Logical Pathway: A pathway internal to a switch connecting an input port to an output port.
Internet Protocol (IP): A library of routines called on by various network communications applications. IP is responsible for transporting packets of data from node to node. It forwards each packet based on a four-byte destination address (IP number).
Switch: The term switch refers to a single router or packet switch within a communications network. Alternatively, it can refer to a contained network with a determined population of inputs and outputs.
A typical data communication network operates in a connectionless mode whereby there is no negotiation between the transmitter receiver and the network with regard to the type or quantity of traffic that is to be sent. The transmitter simply sends the traffic on the network, and relies on the network components to deliver that traffic to the receiver accurately. These network components consist typically of routing nodes (also known as routers or switches) joined by physical links. The main function of the routing nodes is to direct incoming packets to the appropriate outgoing links. In the event of too much traffic arriving for an outgoing link, the router applies specific policies to decide which traffic is forwarded and which is discarded. It is important that these policies are not subverted by having arbitrary loss of the forwarded traffic as it moves to the next point that implements the management policies. The term non-lossy, as applied to a router, implies that any traffic taken from an input port will be delivered without loss to the output port. As applied to a network, the term non-lossy implies that no traffic is lost between one routing node and the next routing node on the particular traffic path. Consequently, in the case of a non-lossy fabric the input port to a router or routing node has full control over which traffic gets discarded when congestion occurs.
Narrowing the focus to communication network applications that have adopted the Internet Protocol, it is important to note that traffic on the Internet is growing very fast. Not only is it expected that within a short time routes within the network will need multiple physical links to support higher transmission rates, but also that there will exist the necessity for bandwidth allocation to different classes of traffic, perhaps for a particular customer or a class of customer. Therefore, the general architecture for future IP-layer large switches will have the traffic buffered at many inputs while waiting for transfer to an output, where the outgoing link will most likely be a logical link consisting of multiple physical links. Indeed, future implementations of routing networks will have input ports connected to output ports that are geographically remote, and where those ports are connected by wide area non-lossy fabrics.
A particularly important objective to achieve within these future IP-layer networks will be the efficient management of bandwidth allocation. In other words, the network must ensure that the bandwidth available on an outgoing link be efficiently distributed between all traffic being routed through the switch fabric.
One solution to this problem is the protocol currently used to enforce a given bandwidth allocation for a traffic class, consisting of rate control exerted at the egress ports of the network. Output buffering is provided to allow for the mismatch between aggregate input rates and the assigned output rate. The output buffers take traffic from every input port and schedule the output of the various classes based on their allocation.
The problem with Egress based control of bandwidth is that ideally the output would like to take traffic from all ports as soon as it arrives. This requires that the output port receive traffic at a rate equal to the maximum sum of all the input rates. For large values of N (number of input ports) and input bandwidth rates, this is not economically sound and lower transfer rates are used. This in turn requires that the output port be selective in what traffic it transfers. In particular, the output port will give preference to traffic whose bandwidth allocation has not been satisfied and delay transferring traffic that can not currently be sent. This normally requires that some bandwidth be consumed in allowing output ports to discover the traffic status of input ports. The output buffered model is further complicated when multi-link trunks (logical links) are employed and the bandwidth allocation must be satisfied over the total bandwidth of the logical output port.
The background information herein clearly shows that there exists a need in the industry to provide a method for improving the management of IP-layer bandwidth allocation within a non-lossy data communication network arrangement.
An object of this invention is to provide a novel switch device capable of controlling the transport of data units, such as IP data packets, between the input ports and the output ports of the switch to limit the possibility of congestion that can arise at the output port level of the switch.
Another object of this invention is to provide a method for controlling the data units transport process in a switch to reduce the risk of internal congestion.
Another object of this invention is to provide a novel multi-node data transmission device, capable of transporting data units, such as IP data packets, capable of effecting inter-node negotiation for managing the transport of data units on a common data transmission pathway interconnecting the nodes.
Another object of this invention is to provide a method for transmitting data units over a multi-node data transmission device, by effecting inter-node negotiation.
As embodied and broadly described herein, the invention provides a switch for processing data units, said switch including:
a plurality of input ports, each input port capable of receiving data units;
a plurality of output ports, each output port capable of releasing data units from said switch;
a switch fabric capable of selectively establishing a plurality of logical pathways between said input ports and said output ports, each logical pathway connecting a certain input port to a certain output port, whereby a data unit received at the certain input port can be transported to the certain output port on the logical pathway;
a plurality of bandwidth control mechanisms for regulating the transport of data units in said switch, each bandwidth control mechanism being associated with a different logical pathway established through said switch fabric.
In a most preferred embodiment, the switch as defined in general terms above can be implemented as a router. Such a router forms a node in a network and it is used to receive data packets at input ports, analyze each packet to determine is destination and through a routing table select the output port through which the data packet is to be released so it can reach its intended destination. To reduce the likelihood of congestion, the router controls the release of data packets received at its input ports to the switch fabric independently for each logical pathway that can be established in the switch fabric. More specifically, when a logical pathway is established through the switch fabric a system of queues is set-up associated with that logical pathway such that the rate of data packets released into the switch fabric follows established bandwidth limits. By independently controlling the transport of data packets on every logical pathway, the aggregate data input rates to the switch fabric can be controlled so as not to exceed the limits of the assigned rates on outgoing links from the switch, thus avoiding traffic congestion on these links.
In a specific example, each input port of the router is provided with an independent controller that is capable of managing the data packets that arrive at that input port for release over a number of logical pathways that can be potentially enabled, connecting that input port to several output ports. Each controller includes a processor and a memory connected to the processor through a data bus. The memory holds the program element that includes instructions executed by the processor to provide the intended functionality. The memory is also capable of storing data on a temporary basis on which the processor operates during the execution of the program. In addition, the memory supports the creation of one or more queues that control the rate at which data packets are released in the switch fabric.
When a certain data packet is received at an input port, the local controller determines first the destination of the packet. This is done by reading the destination address field of the data packet. Once the address is determined, the output port through which the data packet is to be released is found by consulting a routing table mapping destination addresses with output ports. The routing table is a resource common to all the input ports. One possibility of implementation is to store the routing table in a central location. Alternatively, the routing table may also be stored locally, in the memory of each input port controller.
Once the output port through which the data packet is to be released is determined, the logical pathway through the switch fabric over which the data packet is to be transported towards the output port is known. The parameters that identify the logical pathway are simply the end points of the pathway, namely the identification of the input port and the identification of the output port. The data packet is then passed to the bandwidth control mechanism associated with this particular logical pathway. The bandwidth control mechanism includes at least one queue that receives the data packet and requests release of the data packet from the queue to the switch fabric at a rate determined to remain within the bandwidth limit allocated to this logical pathway. This allocation can be determined by consulting a table which maps each logical pathway with a bandwidth limit that the pathway should not exceed. This table is a setting of the router.
The rate at which the bandwidth control mechanism requests release of the data packets from the queue is determined by effecting an accounting operation for calculating average bandwidth used over time by the queue. The computed average bandwidth usage value is then compared to the bandwidth limit allocated to the particular logical pathway. If the bandwidth limit is exceeded, then the queue simply stops requesting release of the packets. This provides a local bandwidth control function allowing to manage the bandwidth usage over a particular logical pathway.
The bandwidth control mechanism described above may be described to accommodate traffic having bandwidth usage priority levels. For example, an input port may be receiving either C1 traffic class (minimum guaranteed bandwidth fraction without overflow) or C2 class traffic (minimum guaranteed fraction with possibility of controlled overflow). Each logical pathway joining one input port to one output port within the switch fabric has a minimum allocated fraction of the overall bandwidth value for that particular output port. The C1 class traffic can use all of its allocated fraction but is never allowed to exceed that fraction. In contrast, traffic of the C2 class can use the reaction entirely and overflow is possible, but it is controlled so as no to exceed a certain maximum allocated fraction of the egress bandwidth value also assigned to the logical pathway. The sum of all bandwidth fractions allocated to different logical pathways terminating in a single output port should not exceed the total reserved bandwidth value for that output port.
In a specific example, consider first a situation where a certain input port of the router receives only C1 class traffic. The queue for traffic of the C1 class is controlled such that data packets are released to the switch fabric at a rate that parallels the rate at which data packets enter the queue. This relationship holds true while the average bandwidth usage is below or equal to the bandwidth fraction allocated to the logical pathway receiving the data packets, during which time traffic is released with high priority. When this fraction is exceeded, overflow occurs and the queue stops requesting service. This means that the control mechanism stops sending request messages to the switch fabric controller to seek his authorization to release packets in the switch fabric. As such, the bandwidth control mechanism implements a self-regulating function to prevent the logical link from using more than its share of the available bandwidth. Objectively, when the queue stops requesting service this may cause packets to be dropped if the queue overflows at its input side. Once the accounting operation determines that the average bandwidth usage is below the assigned level the bandwidth control mechanism resumes the issuance of request signals to the switch fabric for releasing data packets.
In the situation where C2 class traffic is received at the input port the operation of the queue is somewhat different. The linear relationship between the input rate and the rate at which high priority request signals are sent to the switch fabric controller holds until the reserved bandwidth fraction is reached. However, when the reserved bandwidth fraction is exceeded and overflow occurs, the queue does not stop requesting but rather sends requests with low priority status. The switch fabric controller recognizes the low priority status and will allow the release of a low priority data packet only when there are no other high priority data packets to send to the same physical output port.
Each request signal sent from a certain bandwidth control mechanism identifies the logical pathway that is associated with the bandwidth control mechanism and the mode in which the queue is operating, either high or low. Assume that the switch fabric controller receives requests from two bandwidth control mechanisms associated with different logical pathways, both pathways converging toward the same output port. If the configuration table for each logical pathway is accurately set there can be no possibility of overflow because that table assigns a certain fraction of the available bandwidth at the output port to each link. The bandwidth control mechanism of each link has a self regulating function thus enforcing the bandwidth limit at the level of each logical pathway. This implies that when a signal is issued to request a release of a high priority packet the switch fabric should be able to always accept that request. A request can be denied if the packet to be released is low priority.
The arrangement described above avoids congestion at the switch fabric level when the sum of the bandwidth fractions assigned to respective logical pathways does not exceed the total bandwidth an outgoing link from the switch an accommodate.
Another advantage of this system is that the switch fabric controller that is responsible for regulating the entry of the data packets from various input ports based on requests issued by respective bandwidth control mechanisms is of simple construction. It suffices to design the switch fabric controller to recognize different priority requests and accept the high priority requests, while accepting low priority request only when there are no high priority requests to meet.
As embodied and broadly described herein, the invention also provides a method for controlling the transport of data units in a switch, said switch comprising:
a plurality of input ports, each input port capable of receiving data units;
a plurality of output ports, each output port capable of releasing data units;
a switch fabric capable of establishing a plurality of logical pathways between said input ports and said output ports, each logical pathway connecting a certain input port to a certain output port, whereby a data unit received at the certain input port can be transported to the certain output port on the logical pathway;
said method comprising the step of controlling bandwidth usage of logical pathway independently from one another.
As embodied and broadly described herein, the invention provides a switch for processing data units, said switch including:
a plurality of input ports, each input port capable of receiving data units;
a plurality of output ports, each output port capable of releasing data units;
a switch fabric capable of establishing a plurality of logical pathways between said input ports and said output ports, each logical pathway connecting a certain input port to a certain output port, whereby a data unit received at the certain input port can be transported to the certain output port on the logical pathway;
means response to establishment of a logical pathway through said switch fabric to enable a bandwidth control mechanism to regulate bandwidth usage of the logical pathway.
As embodied and broadly described herein, the invention also comprises a method for managing the transport of data units in a switch, said switch comprising:
a plurality of input ports, each input port capable of receiving data units;
a plurality of output ports, each output port capable of releasing data units;
a switch fabric capable of selectively establishing a plurality of logical pathways between said input ports and said output ports, each logical pathway connecting a certain input port to a certain output port, whereby a data unit received at the certain input port can be transported to the certain output port on the logical pathway;
said method comprising the step of enabling a bandwidth control mechanism to regulate bandwidth usage of a certain logical pathway in response to establishment of the certain logical pathway through said switch fabric.
As embodied and broadly described herein, the invention further provides a multi-node data transmission device for transporting data units, said device including:
a first node and a second node;
a data transmission link interconnecting said nodes, said data transmission link defining a first and a second ring-shaped paths, each path permitting the transport of data from one node to another node;
each of said first and second nodes being capable of introducing data units in one of said paths for the transport of the data to the other one of said nodes;
each of said first and second nodes being capable of releasing data units received on at least one of said paths;
one node being responsive to a control message issued by the other node to regulate the introduction of data in one of said pathways by said each node in dependence of the contents of said control message.
Preferably, the multi-node data transmission device as defined above can be used as a data transmission switch. Such switch can be a simple router or it can be implemented as a contained network. For the sake of simplicity, the following description will make reference to a router, being understood that the invention is not limited to this form of implementation.
The router typically includes input ports at which data units, such as IP data packets are received. After processing, those data packets are released through the output ports of the router. In a most preferred form of construction a pair input port/output port form a node. In a specific example, if the router includes three input ports and three output ports, this arrangement will create three nodes.
The nodes are connected by physical links that establish a double counter-rotating ring architecture. More specifically, such architecture has two ring-shaped paths that carry data in opposite (counter-rotating) directions. Most preferably, different physical links are used to support the two ring-shaped paths. However, it is also possible to implement the two ring-shaped paths over the same physical link.
An advantage of the double ring path arrangement is to provide an alternate routing capability should one path fails. In addition, this arrangement creates shorter routes between codes. For instance, in a device using three nodes, say A, B and C, node A desirous of sending a control message to node C, has two possibilities. The first one is to use the first path that imposes to the data a direction of travel Axe2x86x92Bxe2x86x92C. This path is the longest since the data must pass through node B. On the other hand, if the data is sent over the second path imposing a direction of travel Axe2x86x92Cxe2x86x92B, the data will reach its destination faster. This translates into a faster response time of the system and less bandwidth usage over certain internode sections of the paths.
The choice of the path over which data can be sent from one node to another can be made on the basis of the relative positions of the nodes. A simple implementation is to provide each node with a table that maps originating node/destination node pair with the corresponding path that establishes the shortest route. In the example of the three node structure mentioned earlier, the table contains three entries, each entry associating an originating node/destination node pair (AB, AC and BC) and a corresponding path over which the data is to be sent. Thus, when a node has data to send, either a control message or data packets, the table is consulted and the entry corresponding to the destination node found. The path to be used for the data transmission is then determined.
Most preferably, the management of the data transmission from one node to another node is the result of a cooperative relationship between the nodes. This is accomplished by providing each node with the ability to send to an upstream node a control message that identifies the fraction of the total data carrying capacity of the path that the downstream node (originator of the message) will need. The upstream node (receiver of the control message) can then throttle the insertion of the data packets in a way to maintain at least some data carrying capacity for the downstream node. From an implementation point of view, data is transmitted on each path by multiplexing. Each data packet sent from one node to another node occupies a time slot. The inter-node management of the usage of the path that is a common resource is effected on the basis of a batch of time slots. More specifically, each batch of time slots is divided in a certain fashion among the nodes to avoid data congestion at the nodes. In a specific form of construction, each node is designed to send to the upstream node, from where empty time slots will come, a control message indicating the number of time slots the node will need to meet commitments, such as a certain bandwidth or other priority based requirements. The upstream node (message receiver) determines if the empty slots it sees from the next level upstream node can satisfy its requirements and the requirements of the downstream node (originator of the message). In the affirmative, nothing is done and each node inserts data packets in the empty slots. If, however, the slots demand exceeds the available free slots, the upstream node (receiver of the message) will build and sent a control message to the upstream node of the next level requesting additional time slots. Additional time slots can be generated by a node when the node constrains its bandwidth usage. In other words a node can accommodate the needs of a downstream node by limiting the amount of data packets it inserts in the path. This ensures that enough empty slots are left for the downstream node to avoid blocking the transmission at that node entirely or limiting to the point where commitments can no longer be met.
Data insertion throttling can be made particularly when traffic of a minimum guaranteed bandwidth with possibility of overflow is being sent from one node to another node. This class of traffic, commonly referred to C2 is guaranteed a minimum bandwidth and if excess bandwidth is required, more bandwidth can be made available if it is available. In this case the throttling to free more time slots can be effected by effecting insertion of data packets in a path at a rate that does not exceed the minimum guaranteed bandwidth allocated to the traffic class. If, on the other hand, the downstream nodes do not use all the free time slots available, then the node can increase the insertion rate so that excess C2 class traffic can be passed.
In the case when all three nodes can receive C2 class traffic, the total capacity of the two ring-shaped paths should be at least equal to the total of the minimum guaranteed bandwidth of the three nodes. This should avoid congestion. Any excess capacity of the ring-shaped paths is competed for by the three nodes.
As embodied and broadly described herein, the invention also provides a method for data transmission, said method comprising the steps of:
providing a first node and a second node;
providing a data transmission link interconnecting said nodes, said data transmission link defining a first and a second ring-shaped paths, each path permitting the transport of data from one node to another node;
each of said first and second nodes being capable of introducing data in one of said paths for the transport of the data to the other one of said nodes;
each of said first and second nodes being capable of releasing data received on at least one of said paths;
generating at one node a control message;
transporting said control message over either one of said first and second ring-shaped paths to the other node;
regulating the introduction of data units in one of said paths at said other node independence of a contents of said control message.
As embodied and broadly described herein, the invention further provides a multi-node data transmission device for transporting data, said device including:
a first node and a second node;
a data transmission link interconnecting said nodes, said data transmission link defining a first and a second ring-shaped paths, each path permitting the transport of data from one node to another node;
each of said first, second and third nodes being capable of either one of introducing data in one of said paths for the transport of the data to the another one of said nodes and releasing data received on at least one of said paths;
data transported on said first path having a direction of propagation opposite the direction of propagation of data transported on said second path.