Today's traditional computer architectures enlist computer systems with multiple processors to perform receive-side processing of requests received across a network from remote clients. The requests are in the form of I/O tasks that are partitioned across multiple processors working in concert to execute the I/O tasks. Allowing multiple processors to simultaneously perform incoming I/O tasks provides an overall faster performance time for the computer system. One of the more challenging aspects of utilizing multiple processors is “scalability,” that is, partitioning the I/O tasks for connections across processors in a way that optimizes each processor individually and collectively.
A well-known computer hardware system for achieving scalability is a “symmetric multiprocessor” (SMP) system. An SMP system uses two or more identical processors that appear to the executing software to be a single processing unit. In an exemplary SMP system, multiple processors in one system share a global memory and I/O subsystem including a network interface card commonly referred to as a “NIC.” As is known in the art, the NIC enables communication between a host computer and remote computers located on a network such as the Internet. NICs communicate with remote computers through the use of a network communications protocol, for example, TCP (“Transmission Control Protocol”). TCP, like other protocols, allows two computers to establish a connection and exchange streams of data. In particular, TCP guarantees lossless delivery of data packets sent by the remote computer to the host computer (and vice-versa).
After a network connection is established between a host computer and a remote computer, the remote computer sends a data stream to the host computer. The data stream itself may comprise multiple data packets and ultimately entail sending more than one data packet from the remote computer to the host computer. When the NIC on the host computer receives a first data packet, the first data packet is stored in memory along with a packet descriptor that includes pointer information identifying the location of the data in memory. Thereafter, an interrupt is issued to one of the processors in the SMP system. As the interrupt service routine (ISR) runs, all further interrupts from the NIC are disabled and a deferred procedure call (DPC) is requested to run on the selected processor. Meanwhile, as more data packets are received by the NIC, the data packets are also stored in memory along with packet descriptors. No interrupts are generated, however, until the DPC for the first interrupt runs to completion.
As the DPC runs, the data packet descriptors and associated data packets are pulled from memory to build an array of received packets. Next, protocol receive-processing is invoked indirectly via calls to a device driver interface within the DPC routine. An exemplary interface is the Network Driver Interface Specification (NDIS), a Microsoft Windows device driver interface that enables a single NIC to support multiple network protocols. After the DPC runs to completion, interrupts are re-enabled and the NIC generates an interrupt to one of the processors in the multiprocessor system. Because only one DPC runs for any given NIC at any given time, when the scheduling processor is running a receive DPC other processors in the system are not conducting receive processing. This serialization problem limits scalabilty in the SMP system and degrades performance of the multiprocessor system. An alternate method may combine the ISR and DPC into a single routine.
Similarly, because data packets relating to a particular network connection are often received by the NIC at different intervals, receive-side processing of data packets may occur on different processors under the above-described scheme. When a processor processes data packets belonging to a particular network connection, the state for that network connection is modified. If data packets associated with this network connection were previously processed by a first processor, the network connection state resides in the first processor's cache. In order for a second processor to process packets related to a request previously processed by the first processor, the state is pulled from the first processor's cache to main memory, and the first processor's cache is invalidated. This process of copying the state and invalidating the cache results in performance degradation of the multiprocessor system. Similarly, with the above scheme, send and receive processing for the same network connection can occur simultaneously on different processors leading to contention and spinning that also causes performance degradation.
U.S. Pat. No. 7,219,121 provides a method and framework for implementing symmetrical multiprocessing in a multiprocessor system and increasing performance of the multiprocessor system. That application describes a receive-side scheduling framework including a network interface card, memory and two or more processors, communicably coupled to each other to handle network connections and I/O tasks associated with the network connections. An example of such an I/O task is a data stream associated with the Transmission Control Protocol (also referred to as “TCP”). The data packets received by a NIC in the multiprocessor system are stored, along with a data packet descriptor, in memory. A scheduling processor in the multiprocessor system, selected by a load-balancing algorithm, reads each data packet and applies a mapping algorithm to portions of the data packet yielding a map value. The map value, in conjunction with a processor selection policy, determines which “selected processor” in the multiprocessor is scheduled to manage the data stream.
The mapping algorithm is any acceptable algorithm, such as a hashing function, adopted by the system that ensures data packets received from the same network connection are routinely scheduled for processing by the same selected processor in the multiprocessor system. The scheduling processor then processes the data requests assigned to the scheduling processor itself. Thereafter, each of the other selected processors is requested to execute the data requests scheduled to that selected processor.
Moreover, data packets received by the NIC from a network connection are individually hashed, with the use of a hashing function, by the NIC. The hashing function yields a hash value that identifies which processor is selected to process the data packet. The hashing function is chosen such that the load is distributed optimally across the processors. The hash value is then stored along with a data packet descriptor and the data packet in memory. A scheduling processor, selected by a load-balancing algorithm, then reads each data packet descriptor to ascertain the hashing value. With the use of a processor selection policy, each data packet is queued for processing by the selected processor.
While this hash function successfully distributes packets across multiple processors, it does not prevent malicious users from purposefully causing packets to be directed to the same processor. That is, if the hash function is known to a malicious user, that user can design packets that will repeatedly produce the same hash. By doing this, the malicious user can overflow the queue on a specific processor. The hash bucket for each processor has a corresponding linked list of received packets. More computational resources are consumed as the length of that list grows. Accordingly, there is a need for secure receive-side scaling for symmetrical multiprocessing in a multiprocessor system.