Every computer and communication network which transfers data packets must implement some form of scheduling to insure that data traffic progresses through the network at a particular rate. At any given moment, a network may have hundreds of thousands or even millions of connections containing data queues waiting for transport through the network. Some form of scheduling is required to enable network elements to process these data queues in a fair and efficient manner.
In a perfect implementation of a scheduler system, all scheduler states would be perfectly synchronized for accuracy through a multi-level scheduling hierarchy. The scheduler would instantaneously absorb bursty high bandwidth enqueues of data and changes of state injected by intervening scheduling processes. In this perfect implementation, any decision the scheduler makes would be a fair, e.g., correct, decision. However, in more typical implementations, some of the scheduler states can be out-of-date due to other intervening processes, leading to occasional academically incorrect, e.g., unfair, decisions.
A hierarchical scheduler having a computational complexity of O(1), using commonly known “big-O” notation, allows a relatively small number of data structure updates per scheduling level, per scheduling decision, e.g., typically enough to satisfy a scheduling event and a modest average enqueue rate per scheduling level. However, the scheduler may be processing hundreds of thousands (even millions) of scheduler nodes in the hierarchy. The processing order for scheduling should flow down the scheduling hierarchy and the processing order for enqueues should ideally flow up through the same hierarchy; but other processes in the system can simultaneously interact, or interfere with the scheduler state. For example, an enqueue process can generate multiple enqueue events per scheduler decision. A multicast enqueue implementation can generate multiple enqueue events per scheduler decision with a potentially bursty distribution. Additionally, an orthogonal scheduler, such as rate-based scheduling, can generate bursts of state changes to many scheduler nodes and queues per scheduler decision.
Sometimes non-O(1) algorithms are preferred due to ease of implementation, but to maintain accuracy, it could be necessary to run them at a relatively high update rate, thereby increasing processing requirements. Findings of the non O(1) algorithms may need to be merged into a primary O(1) scheduler state, which also could represent an assist or an interference. In this environment, it is very difficult for the O(1) scheduler to maintain dominance over these other intervening or interfering tasks when the other may be actually be capable of changing state more rapidly than the O(1) scheduler itself.
For example, consider a scheduler with a simple round robin scheduler implemented as a scheduling control queue (“SCQ”) containing children which are currently scheduling. The scheduler transmits from the child indicated by the head element of the SCQ and then moves the head element of the SCQ to the tail of the SCQ. If there is more than one child in the SCQ the next scheduler decision will pick a different child based on the new head element in the SCQ. In a stable system with data always available, all children of the scheduler are in the SCQ and they each take their turn scheduling in a round robin transmission pattern. However, if many children of the scheduler are not eligible to transmit or have no data to transmit, they will not be present in the SCQ.
At some point, a burst of children not in the SCQ could become eligible for scheduling. The traditional solutions include designing a scheduler system which can accommodate the maximum burst rate of state changes and absorb the children into the scheduler as their state changes or queuing the burst of children with new state outside the scheduler (the queue is invisible to the scheduler) and absorbing new children into the scheduler state as quickly as possible.
The traditional implementation of designing a scheduler system which can accommodate these bursts of child state changes and absorb children into the scheduler is very difficult to implement in very complex, multi-level scheduling systems because the scheduling system tends to have a large amount of state changes per scheduler instance. Consider a scheduler with 4 levels of scheduling, such that the Level 1 (“L1”) scheduler chooses one of 100 Level 2 (“L2”) schedulers, the selected Level 2 scheduler chooses one of its 1000 Level 3 (“L3”) schedulers, and the selected Level 3 scheduler chooses one of 8 queues to transmit data from. Because there are 100 Level 2 and 100,000 Level 3 schedulers in this system, the system is usually designed as a single circuit per scheduling level with two of the 100,100 contexts loaded into the circuits to make a decision. If the physical scheduler circuits are complex, the absorption of child state can be very difficult because it may require that a Level 3 context be loaded into the scheduler, a corresponding L2 context be loaded and then the L3/L2/L1 state being updated. This must happen in the gaps between when the scheduler is loading context to making transmission decisions.
Therefore, what is needed is a method and system for allowing a primary scheduler to control the order of importance of updates arriving from intervening processes when making scheduling decisions.