1. Field
The disclosure relates generally to systems for processing data from multiple sources by multiple processors, such as network processing devices, and more specifically to systems and methods for assigning work in the form of data packets from multiple data queue sources to multiple processing thread sinks given constraints on which sinks may process work from which sources.
2. Description of the Related Art
Network processing devices, such as routers, switches and intelligent network adapters, are comprised of a network component, which receives incoming data traffic, and a finite set of processing elements, that are employed to process the incoming data. Network processing devices routinely partition incoming traffic into different segments for the purpose of providing network segment specific quality of service (QoS). Examples of quality of service parameters are bandwidth limitation enforcement on one particular segment or bandwidth weighting and/or prioritization across all segments. It is commonplace to associate a queue with each segment into which incoming data is divided. Incoming data packets are placed into the queue of their associated segment as they are received.
A queue scheduler is used to determine an order in which the queues are to be served by the device processing elements. For example, the queue scheduler may determine the next queue that is to be served. The next in line data packet, or other work item, from the selected queue is then placed into a single service queue. The processing elements retrieve data packets from the single service queue to provide the required processing for the retrieved data packet. It is commonplace to use polling or other interrupts to notify one or more of the processing elements when data packets are available for retrieval from the single service queue for processing.
Increasingly, the processing elements are comprised of multiple compute cores or processing units. Each core may be comprised of multiple hardware threads sharing the resources of the core. Each thread may be independently capable of processing incoming data packets. Using a conventional queue scheduler, only one thread at a time can get data from the single service queue.
Network processing system software increasingly desires to constrain which threads can service which queues in order to create locality of work. A conventional queue scheduler polls the status of all queues to determine the next best suited queue to process without reference to such constraints.
As the number of data queues increases, the time required in order to make a scheduling decision, also known as the scheduling period, also increases. For example, a device that is to support 100 Gbps network traffic comprised of small 64 byte packets needs to support a throughput of roughly 200 million packets per second. On a 2 GHz system, this implies that a scheduling decision needs to be accomplished in less than 10 clock cycles. In conventional queue schedulers, queues are attached to a queue inspection set, often referred to as a ring, when queue status is changed from empty to not-empty. Similarly, queues are detached from the queue inspection set when queue status is changed from not-empty to empty. Use of a queue inspection set limits the number of queues that need to be examined by the queue scheduler during a scheduling period, since the queue scheduler need only examine queues having data to be processed, and these are the not-empty queues attached to the queue inspection set.