On a node in a packet switching communication network, a network scheduler, also called packet scheduler, is an arbiter that manages the sequence of network packets in the transmit and receive queues of the network interface controller (NIC). The network scheduler logic decides which network packet to forward next from the buffer. The buffer works as a queuing system, storing the network packets temporarily until they are transmitted. The buffer space may be divided into different queues, with each of them holding the packets of one flow according to configured packet classification rules. For example, packets can be divided into flows by their source and destination Internet Protocol (IP) addresses. Network scheduling algorithms and their associated settings determine how the network scheduler manages the buffer.
Network scheduling algorithms may provide specific reordering or dropping of network packets inside various transmit or receive buffers. Such reordering and dropping is commonly used as attempts to compensate for various networking conditions, like reducing the latency for certain classes of network packets, and are generally used as part of the quality of service (QoS) measures. For example, network scheduling algorithms may enable active queue management (AQM) and network traffic shaping. An AQM algorithm is used to select network packets inside the buffer when that buffer becomes full or gets close to becoming full, often with the larger goal of reducing network congestion. Traffic shaping is a technique which delays some or all packets to bring them into compliance with a desired traffic profile. Traffic shaping is used to optimize or guarantee performance, improve latency, and/or increase usable bandwidth for some kinds of packets by delaying other kinds. Traffic shaping provides a means to control the volume of traffic being sent into a network in a specified period (e.g., bandwidth throttling/shaping), or the maximum rate at which the traffic is sent (e.g., rate limiting/shaping), or based on other criteria.
A conventional hierarchical queueing model is illustrated in FIG. 22. In packet network applications such as a broadband network gateway or other broadband network edge platform, the queueing model may need to accommodate a large number of discrete queues on the output side of a packet forwarding path. In particular, each destination subscriber device, of which there may be tens of thousands on a single network port 2260, has rate shaper instances (2230a to 2230d), each of which has as input a small number (4 or 8 typically) of Class of Service (CoS) queues (2210a to 2210p). Class of service (CoS) is a parameter used in network data and voice protocols to differentiate the types of payloads contained in the packets being transmitted. The objective of such differentiation is generally associated with assigning priorities to each data payload.
Each set of per-device queues in FIG. 22 has a Weighted Fair Queue (WFQ) scheduler (2220a to 2220d) that pulls packets from the associated queues. Weighted fair queueing is a type of data packet scheduling scheme used by network schedulers to specify, for each device's packet flow, which fraction of the capacity will be given to each CoS queue 2210. Each WFQ scheduler 2220 is attached to a per-device Rate Shaper (2230a to 2230d). The set of per-device Rate Shapers (2230a to 2230d) provide downstream or upstream traffic shaping, and are then attached to one-or-more WFQ schedulers (e.g., 2240a, 2240b), which in turn is/are either attached directly to a physical output port 2260 or to a Virtual Port or Aggregate Rate Shaper (e.g., 2250a, 2250b). The Virtual Port or Aggregate Rate Shaper(s) 2250 provide downstream traffic shaping. Where Virtual Ports or Aggregate Rate Shapers 2250 are used, these are attached to the physical port 2260.
A standard implementation model has the input side processing queue packets onto the appropriate CoS queue 2210 for transmission to a destination device (e.g., Device 1, Device 2, etc.). Downstream rate shaping is not a consideration when selecting which CoS queue to enqueue a packet, as downstream rate shaping is handled by the output side of the model itself. The output side is illustrated in FIG. 22. Packets are picked from the device-specific CoS queues 2210 and forwarded, taking into account the rate shaper delay, weighted queueing and port/sub-port bit rates at all levels of the hierarchy. The output processing must find the next packet to transmit from all the per-device packet queues taking into account each of the Rate Shapers 2230/2250 and WFQ schedulers 2220/2240 in accordance with the hierarchy.
In a hardware implementation, it is possible to implement parallel algorithms that pick packets to send without inducing unnecessary delay (i.e., dead time on the network port 2260 due to inability to find a packet to transmit because of algorithm delay). However, in a software-based implementation, it is difficult to create an algorithm that will avoid dead time on the port 2260 because, in most central processing units (CPUs), there is little ability to perform a high degree of parallel processing. However, in a typical broadband gateway network scheduler software implementation of the hierarchical queuing model illustrated in FIG. 22, the network scheduler has to deal with tens of thousands of queues as well as tens of thousands of Rate Shaper and WFQ instances, creating efficiency and performance problems for a software implementation.
For example, a software implementation may have difficulty optimizing usage of the port 2260 because at any time, each of the tens of thousands of queues 2210 may or may not have any packets queued. Determining packet availability may require scanning this large number of queues, which requires excessive CPU processing as well as costly memory accesses even if queue occupancy is abstracted to a bit vector consisting of a single bit per queue. Even if multiple processor cores are used in parallel, the number of queues 2210 to be scheduled will typically far exceed the number of cores and still require a substantial amount of the processing power of each core.
As another example of a problem faced by a software implementation, the queue processing can waste CPU cycles when determining whether a destination device rate shaper is actually permitted to send a packet. That is, the network scheduler may try multiple device queues 2210 that are non-empty and still not find a packet that can be sent because a Rate Shaper's rate-limiting maximum rate requires delaying the next packet. To optimize the throughput of a software-based forwarding implementation it is desirable to optimize CPU cycles and, in particular, to avoid wasting CPU cycles in this way.