In many telecommunications applications, a scheduler is used to resolve contention among multiple tasks competing for a limited resource. For example, such a scheduler is commonly used in a network processor to schedule multiple traffic flows for transmission over a specific transmission bandwidth.
A network processor generally controls the flow of data between a physical transmission medium, such as a physical layer portion of a network, and a switch fabric in a router or other type of switch. An important function of a network processor involves the scheduling of cells, packets or other data blocks, associated with the multiple traffic flows, for transmission to the switch fabric from the physical transmission medium of the network and vice versa. The network processor scheduler performs this function.
An efficient and flexible scheduler architecture capable of supporting multiple scheduling algorithms is disclosed in U.S. patent application Ser. No. 10/722,933, filed Nov. 26, 2003, and entitled “Processor with Scheduler Architecture Supporting Multiple Distinct Scheduling Algorithms,” which is commonly assigned herewith and incorporated by reference herein.
It is often desirable for a given scheduling algorithm implemented in a network processor or other processing device to be both simple and fair. Simplicity is important because the processing device hardware typically does not have a large amount of time to make a given scheduling decision, particularly in a high data rate environment. A good scheduler should also be fair. For example, it may allocate the bandwidth according to the weights of the users, with the higher-priority users getting more bandwidth than lower-priority users.
An example of a simple and fair scheduling algorithm is the Weighted Round-Robin (WRR) scheduling algorithm. Assume that in a given telecommunications application there are a number of users competing for one resource, which can process one data block in each timeslot. The scheduler must decide which user can send one data block to the server in each timeslot. Each user has a weight to indicate its priority. The user with larger weight has higher priority. Under ideal conditions, the services received by the users should be proportional to their weights. A WRR scheduler serves the users in proportion to their weights in a round-robin fashion. Assume there are N users. The i-th user, Ui, has a weight of Wi, which is an integer. Let F be the sum of the weights Wi for the N users. Define F timeslots as one frame, such that F is the frame size in timeslots. WRR serves Ui for exactly Wi timeslots in each frame. Therefore, each user gets their fair share of the frame. For example, assume there are four users U1, U2, U3 and U4 that have the weights of 4, 3, 2, and 1, respectively. Then the scheduler can serve these four users by repeating the following sequence per frame: U1, U2, U3, U4, U1, U2, U3, U1, U2, U1. There are ten timeslots in one frame, and U1 can get four timeslots in each frame.
A problem with WRR is that it may cause long periods of burstiness. For example, consider a case in which there are 11 users, where U1's weight is 10 and all other users' weights are 1. In this case, the sum of the weights is 20, so there are 20 timeslots per frame. WRR would serve the users as follows: U1, U2, U3, U4, U5, U6, U7, U8, U9, U10, U11, U1, U1, U1, U1, U1, U1, U1, U1, U1. The service received by U1 is very bursty. This is clearly not desirable in telecommunication systems, because long burstiness could overflow the buffers of user communication devices. Such burstiness becomes increasingly problematic in those practical applications in which the total number of users may be several hundreds or more.
Alternative scheduling algorithms are known which overcome the burstiness problem of WRR. These include, by way of example, Weighted Fair Queuing (WFQ) and Worst-case Fair Weighted Fair Queueing (WF2Q). Unfortunately, these alternative algorithms are typically considerably more complex than WRR, and therefore may be difficult to implement in network processors and other processing devices operating in high data rate environments.
The above-cited U.S. patent application Ser. No. 10/903,954 discloses a frame mapping scheduler that provides simplicity and fairness comparable to that of WRR, but without the burstiness problem commonly associated with WRR. More specifically, a frame mapping scheduler in an illustrative embodiment described therein comprises scheduling circuitry which utilizes a weight table and a mapping table. The weight table comprises a plurality of entries, with each of the entries identifying a particular one of the transmission elements. The mapping table comprises at least one entry specifying a mapping between a particular timeslot of a frame and an entry of the weight table. The scheduling circuitry determines a particular transmission element to be scheduled in a given timeslot by accessing a corresponding mapping table entry and utilizing a resultant value to access the weight table.
The mapping table entries may be predetermined in accordance with a golden ratio policy. As one more particular example, the entries of the mapping table may be determined by utilizing a golden ratio φ to computeGi=(i*φ−1),for a given range of index values i, such as i=0, 1, . . . F−1, where F denotes the number of mapping table entries. The computed values are then sorted in ascending order or other specified order, and the entries are taken as a sequence of the resulting subscript indices.
It is also possible to determine the mapping table entries using other policies. For example, the scheduling circuitry may generate the entries of the mapping table as needed in accordance with a bit-reverse policy. In this case, a given entry of the mapping table is determined by computing a bit-reverse value of a corresponding timeslot number. This arrangement has the advantage of avoiding the need to store the mapping table.
However, in schedulers which utilize a golden ratio policy, or more generally any policy that requires a stored mapping table, the mapping table may be large and therefore require substantial amounts of memory. It is usually preferred that such mapping table memory be arranged “on-chip,” that is, on the same integrated circuit as the scheduler, so as to reduce access times. For example, such an arrangement is beneficial in network processing applications in which data blocks may need to be processed substantially in real time.
Accordingly, techniques are needed for compressing the mapping table in order to reduce the amount of memory required to store the table, thereby facilitating its implementation in a network processor integrated circuit or other device comprising a frame mapping scheduler.