1. Field of the Invention
The present invention relates generally to the field of networking and, more particularly, to a method and apparatus for efficient interrupt event notification in a scalable input/output device.
2. Description of the Related Art
In known networked computer systems, the network interface functionality is treated and supported as an undifferentiated instance of a general purpose Input Output (I/O) interface. This treatment is because computer systems are optimized for computational functions, and thus networking specific optimizations might not apply to generic I/O scenarios. A generic I/O treatment results in no special provisions being made to favor network workload idiosyncrasies. Known networked computer systems include platform servers, server based appliances and desktop computer systems.
Known specialized networking systems, such as switches, routers, remote access network interface units and perimeter security network interface units include internal architectures to support their respective fixed function metrics. In the known architectures, low level packet processing is segregated to separate hardware entities residing outside the general purpose processing system components.
The system design tradeoffs associated with networked computer systems, just like many other disciplines, include balancing functional efficiency against generality and modularity. Generality refers to the ability of a system to perform a large number of functional variants, possibly through deployment of different software components into the system or by exposing the system to different external workloads. Modularity refers to the ability to use the system as a subsystem within a wide array of configurations by selectively replacing the type and number of subsystems interfaced.
It is desirable to develop networked systems that can provide high functional efficiencies while retaining the attributes of generality and modularity. Networked systems are generally judged by a number of efficiencies relating to network throughput (i.e., the aggregate network data movement ability for a given traffic profile), network latency (i.e., the system contribution to network message latency), packet rate (i.e., the system's upper limit on the number of packets processed per time unit), session rate (i.e., the system's upper limit on creation and removal of network connections or sessions), and networking processing overhead (i.e., the processing cost associated with a given network workload). Different uses of networked systems are more or less sensitive to each of these efficiency aspects. For example, bulk data movement workloads such as disk backup, media streaming and file transfers tend to be sensitive to network throughput, transactional uses, such as web servers, tend to also be sensitive to session rates, and distributed application workloads, such as clustering, tend to be sensitive to latency.
Scalability is the ability of a system to increase its performance in proportion to the amount of resources provided to the system, within a certain range. Scalability is another important attribute of networked systems. Scalability underlies many of the limitations of known I/O architectures. On one hand, there is the desirability of being able to augment the capabilities of an existing system over time by adding additional computational resources so that systems always have reasonable room to grow. In this context, it is desirable to architect a system whose network efficiencies improve as processors are added to the system. On the other hand, scalability is also important to improve system performance over time, as subsequent generations of systems deliver more processing resources per unit of cost or unit of size.
The networking function, like other I/O functions, resides outside the memory coherency domain of multiprocessor systems. Networking data and control structures are memory based and access memory through host bridges using direct memory access (DMA) semantics. The basic unit of network protocol processing in known networks is a packet. Packets have well defined representations when traversing a wire or network interface, but can have arbitrary representations when they are stored in system memory. Network interfaces, in their simplest forms, are essentially queuing mechanisms between the memory representation and the wire representation of packets.
There are a plurality of limitations that affect network efficiencies. For example, the number of queues between a network interface and its system is constrained by a need to preserve packet arrival ordering. Also for example, the number of processors servicing a network interface is constrained by the processors having to coordinate service of shared queues, when using multiple processors; it is difficult to achieve a desired affinity between stateful sessions and processors over time. Also for example, a packet arrival notification is asynchronous (e.g., interrupt driven) and is associated with one processor per network interface. Also for example, the I/O path includes at least one host bridge and generally one or more fanout switches or bridges, thus degrading DMA to longer latency and lower bandwidth than processor memory accesses. Also for example, multiple packet memory representations are simultaneously used at different levels of a packet processing sequence with consequent overhead of transforming representations. Also for example, asynchronous interrupt notifications incur a processing penalty of taking an interrupt. The processing penalty can be disproportionately large considering a worst case interrupt rate.
Network functions in prior art systems are generally layered and computing resources are symmetrically shared by layers that are multiprocessor-ready, underutilized by layers that are not multiprocessor ready, or not shared at all by layers that have coarse bindings to hardware resources. In some cases, the layers have different degrees of multiprocessor readiness, but generally they do not have the ability to be adapted for scaling in multiprocessor systems. Layered systems often have bottlenecks that prevent linear scaling. Another problem with prior art network systems is that time slicing occurs across all of the layers, applications, and operating systems. Furthermore, in prior art systems, low-level networking functions are interleaved, over time, in all of the elements.
In view of the foregoing, it is apparent that there is a need for a method and apparatus that dedicates network processing resources rather than utilizing those resources on a time-sliced basis. Moreover, significant improvement in network processing power can be achieved by asymmetrically allocating processing resources and memory resources, which has not been implemented in prior art systems.
Additional performance improvements can be achieved by implementing numerous other network data processing features which have heretofore been unavailable in the prior art. For example, processing efficiencies can be achieved by arbitrarily assigning and mapping well defined subfunctions or sessions to preassigned processing entities. Additional efficiencies can be obtained by separating and isolating control of the network interface. In addition, it would be advantageous to provide a method for overlaying an interfunction interface on top of a shared memory region between two or more functions. Performance increases can also be obtained by providing an apparatus for processing resource dispatching for time sliced and run to completion. Additional performance increases can be achieved through an efficient interrupt event notification apparatus for scalable input/output devices used in the network system. Each of these improvements will be discussed hereinbelow.