1. Field of the Invention
The present invention relates to networking and more particularly to parallelizing network workload.
2. Description of the Related Art
In known networked computer systems, the network interface functionality is treated and supported as an undifferentiated instance of a general purpose Input Output (I/O) interface. This treatment is because computer systems are optimized for computational functions, and thus networking specific optimizations might not apply to generic I/O scenarios. A generic I/O treatment results in no special provisions being made to favor network workload idiosyncrasies. Known networked computer systems include platform servers, server based appliances and desktop computer systems.
Known specialized networking systems, such as switches, routers, remote access network interface units and perimeter security network interface units include internal architectures to support their respective fixed function metrics. In the known architectures, low level packet processing is segregated to separate hardware entities residing outside the general purpose processing system components.
The system design tradeoffs associated with networked computer systems, just like many other disciplines, include balancing functional efficiency against generality and modularity. Generality refers to the ability of a system to perform a large number of functional variants, possibly through deployment of different software components into the system or by exposing the system to different external workloads. Modularity refers to the ability to use the system as a subsystem within a wide array of configurations by selectively replacing the type and number of subsystems interfaced.
It is desirable to develop networked systems that can provide high functional efficiencies while retaining the attributes of generality and modularity. Networked systems are generally judged by a number of efficiencies relating to network throughput (i.e., the aggregate network data movement ability for a given traffic profile), network latency (i.e., the system contribution to network message latency), packet rate (i.e., the system's upper limit on the number of packets processed per time unit), session rate (i.e., the system's upper limit on creation and removal of network connections or sessions), and networking processing overhead (i.e., the processing cost associated with a given network workload). Different uses of networked systems are more or less sensitive to each of these efficiency aspects. For example, bulk data movement workloads such as disk backup, media streaming and file transfers tend to be sensitive to network throughput, transactional uses, such as web servers, tend to also be sensitive to session rates, and distributed application workloads, such as clustering, tend to be sensitive to latency.
Scalability is the ability of a system to increase its performance in proportion to the amount of resources provided to the system, within a certain range. Scalability is another important attribute of networked systems. Scalability underlies many of the limitations of known I/O architectures. On one hand, there is the desirability of being able to augment the capabilities of an existing system over time by adding additional computational resources so that systems always have reasonable room to grow. In this context, it is desirable to architect a system whose network efficiencies improve as processors are added to the system. On the other hand, scalability is also important to improve system performance over time, as subsequent generations of systems deliver more processing resources per unit of cost or unit of size.
The networking function, like other I/O functions, resides outside the memory coherency domain of multiprocessor systems. Networking data and control structures are memory based and access memory through host bridges using direct memory access (DMA) semantics. The basic unit of network protocol processing in known networks is a packet. Packets have well defined representations when traversing a wire or network interface, but can have arbitrary representations when they are stored in system memory. Network interfaces, in their simplest forms, are essentially queuing mechanisms between the memory representation and the wire representation of packets.
There are a plurality of limitations that affect network efficiencies. For example, the number of queues between a network interface and its system is constrained by a need to preserve packet arrival ordering. Also for example, the number of processors servicing a network interface is constrained by the processors having to coordinate service of shared queues, when using multiple processors; it is difficult to achieve a desired affinity between stateful sessions and processors over time. Also for example, a packet arrival notification is asynchronous (e.g., interrupt driven) and is associated with one processor per network interface. Also for example, the I/O path includes at least one host bridge and generally one or more fanout switches or bridges, thus degrading DMA to longer latency and lower bandwidth than processor memory accesses. Also for example, multiple packet memory representations are simultaneously used at different levels of a packet processing sequence with consequent overhead of transforming representations. Also for example, asynchronous interrupt notifications incur a processing penalty of taking an interrupt. The processing penalty can be disproportionately large considering a worst case interrupt rate.
One challenge in network systems relates to scaling network workload, i.e., to parallelizing network workload. Parallelization via packet load balancing is typically performed outside of the computing resources and is based on information embedded inside the packet. Thus the decision may be stateful (i.e., the prior history of similar packets affects the parallelization decision of which computing node to use for a particular packet), or the decision may be stateless (i.e., the destination of the packet is chosen based on the information in the packet but unaffected by prior packets).
An issue relating to parallelization is that loose coupling of load balancing elements limits the degree of collaboration between computer systems and the parallelizing entity.
There are a plurality of technical issues that are not present in a traditional load balancing system (i.e., a single threaded load balancing system). For example, in a large simultaneous multiprocessing (SMP) system with multiple partitions, it is not sufficient to identify the partition to process a packet, since the processing can be performed by one of many threads within a partition. Also, the intra partition communication overhead between threads is significantly lower than inter partition communication, which is still lower than node to node communication overhead. Also, resource management can be more direct and simpler than with traditional load balancing systems. Also, a SMP system may have more than one networking interface.