1. Field of the Invention
This invention relates to input processing for computer networks and, more particularly, to technology for improving throughput in such systems having multiprocessor implementations.
2. Background and Related Art
In computer network systems, notably those which are UNIX (Trademark of the X/Open Systems Corporation) operating system based, network traffic from a plurality of clients and servers incoming on the net is in need of processing. This network input processing for a given network I/O device has in the past always been single threaded processing at the interrupt level whereby all inbound packets from a network device are processed sequentially. Early in the stages of computer networks this was not necessarily a significant problem.
However, with the maturation of computer technology, it becoming more commonplace to encounter computer network systems involving a number of CPUs present on the system, (being referred to in the art as multiprocessor or xe2x80x9cMPxe2x80x9d systems). The weakness of single threaded processing, as MP systems, has now become apparent due to bottlenecks caused by this single thread processing. In the past, with single CPU systems this bottleneck was not of as much visibility. However, as noted with the advent of MP systems, Internet servers have grown bigger and faster with multiple CPUs, with 8-way and 12-way CPU systems (e.g., including 8, 12, or more CPUs) becoming more and more commonplace. The inherent weakness of this single thread processing mode is that the aforementioned network input, in accordance with prior art technology, is processed only by the single CPU at any given time regardless of the number of CPUs on the system available for such processing.
A system and method was thus highly desired, given this maturation of computer network technology into MP systems, whereby such network input processing could take better advantage of the MP scalability so as to improve network throughput on the larger network servers.
In an effort to address this problem, various systems have been developed employing differing techniques for queuing inbound packets. However, such systems nevertheless suffered from very serious drawbacks. First, they were not tied to MP scalability. Still further, they did not address the problem of out-of-order packets caused by distributing the incoming packets to multiple parallel processing nodes.
Queuing is a method which has long been known for seeking to parallelize processing in order to increase throughput and distribute workloads. However, a serious problem with this in the context of the instant invention is that in accordance with such conventional queuing, one of a multiple of CPUs could in fact obtain a packet from the queue for processing, however there was no assurance that with such multiple processors obtaining packets in this manner that the packet order would be maintained. It is extremely important that this order be maintained when the packets arrive at sockets. Once systems expanded to more than one CPU to process packets for throughput and concurrency, loss of control of scheduling the packets in their order occurred in previous systems. While this in and of itself was not fatal to operation of multiprocessor systems employing queues, once packets are out of order and flowing up to an endpoint of the system, additional resources must be expended in order to process and correctly resequence these packets in the protocol stack. This additional processing in order to ensure correct packet sequencing in itself is time consuming so as to result in little net gain otherwise affordable by employing queues, multiple CPUs and parallelization in the first place.
One practical problem resulting from the inability to provide for network input distributed processing is that throughput of an individual CPU on the order of 100 megabits per second is less than that of network adapter cards which may nominally have throughputs of one gigabyte per second, e.g., operating at a 10xc3x97 factor faster than the CPU. In accordance with conventional prior practice, wherein no more than one CPU at a time could be processing packets associated with one of the network I/O devices, the net result of this was that network throughput was thus CPU bound, e.g., throughput could not exceed the capacity of a single CPU running interrupts and processing incoming packets from a single given physical interface. In other words, network throughput was CPU bound, even in MP systems. Thus it became increasingly difficult to justify to potential customers of MP systems why they should invest significant amounts of money for these systems without enjoying a concomitant gain in performance. Similarly, it became increasingly difficult to justify making the case that a customer should invest in faster and more expensive network adapter cards (which, as noted, may in some instances have a capability 10xc3x97 faster than the CPUs themselves) when, upon installing such an adapter, the customer still does not see a 10xc3x97 performance increase (due to the aforementioned bottleneck caused because only a single CPU is servicing an input or interrupt notwithstanding the presence of other processors with concurrent processing capability). Thus there was a need to demonstrate to the customer improvements in system performance to justify the associated cost of adding additional CPUs in MP systems and more expensive adapter cards.
Thus, although queuing inbound packets was known, these prior efforts were not tied to MP scalability and such efforts did not address the aforementioned problem of out-of-order packets. It will be appreciated that this is a key shortcoming of prior art attempts to solve the problem of distributing random input to multiple engines while nevertheless maintaining the important input sequence for the upper layer protocol (mainly TCP/IP) to work properly. As previously noted, these out-of-order packets cause severe performance problems for such protocols as TCP or UDP due in part to the overhead associated with sorting out the proper packet sequences.
Network input processing is distributed to multiple CPUs on multiprocessor systems to improve network throughput and take advantage of MP scalability. Packets received on the network are distributed to N high priority threads, wherein N is the number of CPUs on the system. N queues are provided to which the incoming packets are distributed. When one of the queues is started, one of the threads is scheduled to process packets on this queue. When all of the packets on the queue are processed, the thread becomes dormant. Packets are distributed to one of the N queues by using a hashing function based on the source MAC address, source IP address, or the packet""s source and destination TCP port number, or all or a combination of the foregoing. The hashing mechanism ensures that the sequence of packets within a given communication session will be preserved. Distribution is effected by the device drivers of the system Parallelism is thereby increased on network I/O processing, eliminating CPU bottleneck for high speed network I/Os, thereby improving network performance.