In computer networks utilizing high bit rates, single processors are not able to process the volume of data as it arrives. As a result, a bottleneck is formed as data is received. To improve a computer's ability to process data received at high bit rates, a number of processes have been developed.
One such process is Receive Side Scaling (“RSS”). RSS is typically implemented as a function of a particular Network Interface Card (“NIC”) in hardware and tends to be limited to the capabilities available from the NIC. RSS functions implemented by a NIC may be deficient in several ways such as, for example, limited support for packet parsing, failing when presented with packets that are tunneled, failing when presented with packets that are fragmented, or failing when presented with unknown packet or protocol types.
Some technologies have been developed to improve a NIC's deficiencies in RSS processing. Some of these technologies include Network Processing Units (NPUs) with specialized packet scheduling hardware, dedicated hardware or software based load balancing, and embedding switching/routing devices in the NIC.
Network Processing Units (“NPU”) with specialized packet scheduling hardware have a dedicated hardware function that schedules packets on each of the packet processing cores. This function is also typically used to schedule and order egress frames to prevent out of order packets. This solution scales well, but suffers from the same limitations as a NIC RSS function. This method usually fails gracefully by scheduling all packets in a total-order manner. NPUs are typically much more expensive than a general purpose processor and add significant cost versus other methods.
Dedicated hardware or software based load balancers offer a localized solution. These work by offloading the distribution of received data to a centralized system or a set of blades that performs the packet parsing and hashing function. This has the benefit of being scalable and customizable but is only feasible where the traffic can be routed through this external load balancing device. This method, however, introduces a choke point in the system for both scalability and reliability and is expensive since dedicated resources are used. Further, this mechanism also does not fully eliminate the need for RSS functionality in processing packets in multicore processors since packets marked for delivery to a particular core are still parsed and distributed directly to a particular target core.
Embedded switching/routing devices are developed as a hybrid of NPUs and dedicated load balancers. These devices are embedded directly on a multicore processor and perform parsing, hashing, marking and delivery functionality. These devices, however, add cost and complexity to each of the packet handling systems and can contradict a desire to implement a lower cost general purpose multicore processor.
The approaches listed above do not provide a cost-effective, easily scalable method of processing data received at high bit rates.