1. Field of Invention
This invention relates to a method for efficiently storing and retrieving data. More particularly this invention relates to dynamic or static data associated with a set of service tags and a method for arbitrating between the data using these service tags. The data structure and the method are particularly advantageous for implementation in digital computer hardware. The primary application of current interest is to circuits used for traffic management in high-speed switches and routers. However, this invention may be useful in a variety of applications involving input data that is stored and then retrieved based on the priority of service tags. Consequently, the selection of data retrieved is an arbitration decision between the data stored. The data stored may be considered as separate parties each contending for the retrieval resource.
A very useful application of this invention is in the construction of devices which participate in Internet Protocol (IP) networks or Asynchronous Transfer Mode (ATM) networks. The following discussion reviews how the features of an arbitration scheme can lead to “rich networking services” or traffic management, and how this invention's arbitration provides better traffic management.
At every point of contention in a network, packets or cells must be prioritized to ensure Service Level Agreements (SLA). Underlying, identified flows often serve as the prioritization criterion. This re-ordering of packets modifies the traffic shape of these flows and requires a buffer.
The scheduling discipline chosen for this prioritization, or Traffic Management (TM), can affect the traffic shape of flows and micro-flows through: Delay (buffering); Bursting of traffic (buffering and bursting); Smoothing of traffic (buffering and rate-limiting flows); Dropping traffic (choosing data to discard so as to avoid exhausting the buffer); Delay jitter (temporally shifting cells of a flow by different amounts); Not admitting a connection (cannot simultaneously guarantee existing SLAs with an additional flow's SLA).
In current non-centralized router architectures, the TM provided by the switch fabric consists of a small number of depth-limited queues of strict priority. Therefore, the TM on the ingress and egress of each line card must carry the burden of enforcing SLAs.
2. Prior Art
The standard method of implementing traffic management is Weighted Fair Queuing (WFQ); it approximates the “ideal” Max-Min Weighted Fair Share scheduling very closely. However, providing WFQ to every flow or connection in a box is unwieldy to implement. As a consequence, discard methods such as Weighted Random Early Detect (WRED) and policing methods such as Dual-Leaky Bucket (DLB) were created to aggregate flows into a much smaller number of queues as well as approximations to WFQ. By making each of these queues corresponded to a service level, or class, one could employ Class Based WFQ (CBWFQ) easily. This scheme is logically diagrammed in FIG. 1A.
Before the processing shown in FIG. 1A, cells have already been stored in DRAM in a linked list manner. This buffer management may be done per virtual circuit (VC) and accomplished using a free-list memory, and head and tail storage for each VC.
As shown in FIG. 1A, traffic is policed and shaped at one of 16 “port-level” schedulers depending on the number of channels on the line-card. According to the bandwidth of each port, a Round Robin or Weighted Round Robin arbiter transports each flow number, effectively a private virtual circuit (PVC) number (not data), to a class-based queue. Here PVCs of the same class, irrespective of port, are aggregated to be scheduled by the WFQ scheduler. A given class-based queue may contain thousands of PVCs. As WFQ decisions are made, cells or packets are further queued by destination-interface to prevent Head of Line (HoL) blocking. This may be a destination interface on a switch fabric or a destination port on an egress path.
The invention can be used to make an approximate WFQ decision amongst a very large number of queues. This eliminates many of the complexities previously mentioned. The ability to make such large scheduling decisions simplifies much of the surrounding logic as shown in FIG. 1B.
In addition, the invention serves to perform traffic shaping on every queue. The resulting quality of service (QoS) capability will now be described in greater detail using ATM traffic classifications familiar to those of ordinary skill in the art.
Using the common Traffic Management design in FIG. 1A, the delay of a given packet/cell is determined by the behavior of other connections in the same class. All connection in a class-based queue have been policed, but often with a large allowed Maximum Burst Size (MBS). Consequently, the Maximum Cell Transfer Delay (CTD) is equal to the aggregate burst size of each connection present in the class-based queue. If statistically less than 1 packet in a million (to achieve five 9's) is delayed at a certain connection admission level, then the added delay can be ignored by current networking equipment standards. However, the Gaussian curve formed by uncorrelated bursting sources does extend well enough to make a significant impact; guarantees can only be hedged by ˜30%. This means the percentage of VBR-rt connections admission control must deny is on the order of the Peak Cell Rate (PCR) to Sustained Cell Rate (SCR) ratio. When this same scheme is used with IP packets (DSCP enabled) or Label-Switched Paths (LSPs), real-time applications become infeasible.
Using the invention, every connection and connection type—a VBR-rt PVC, a MPLS pipe, a Martini-Frame-Relay tunnel—gets the theoretical minimum delay through the scheduler. With cells, this delay is guaranteed. With packets, the only unfairness stems from very large packets; they proscribe a scheduling decision during the middle of their transfer. This allows Admission Control to accept an enormous number of real-time connections with minimum delay budgeting.
Cell Delay Variation (CDV) refers to the jitter in the CTD over all cells transferred. There are many ways to quantify CDV: standard deviation, average of the differences, jitter limit of 99 out 100, etc. The ATM standard quantifies it with: CTD(max)−CTD(min).
In the common TM approach, a cell will receive more or less delay as the thousands of other connections, mapped to the same class-based queue, burst up and down. The range of jitter increases linearly with the number of connections. Admission control cannot do anything to reduce this jitter, except to reduce the number of PVCs admitted to a queue and honor any CDV guarantees only statistically. IP only exacerbates this problem with variant packet sizes adding more jitter.
Because no aggregation takes place using the invention, there is no jitter other than that caused by large packets. This deep insulation of one flow from another bodes well for PPVPNs.
The pervasive Dual Leaky Bucket (DLB) polices a mean bandwidth (SCR) and a peak bandwidth (PCR) for every flow. Policing the SCR shares the bandwidth of a class-based queue amongst its users. Policing the PCR limits the delay through and buffer use of a queue. Unfortunately, the later policer must enforce a stricter policy than is necessary. The resource that is being protected is usually a DRAM buffer. The DLB guesses at how much buffer space has (or will be) taken up by this flow and marks traffic as out-of-profile when it deems necessary.
Instead of simulating an individual queue with a DLB policer, this invention used in this application can directly use its queues for these purposes. Policing is done by directly reading the buffer usage by a particular queue. This allows large traffic peaks and relaxes the neighbor's shaping requirements. Furthermore, since use of this invention reserves shaping until the final scheduling, the shaping is more effective.