For packetized data communications in a multi-computer network, virtual circuits (also known as "virtual connections") are established between processes running on various nodes within the network. Each virtual circuit forms a communications path between two such processes, which are often running on separate nodes. In those nodes having multitasking operating systems, multiple processes can be executing simultaneously. Thus, at any given time, a particular node in a multi-computer network may interface with multiple virtual circuits, each of which is associated with a particular process running on that node and ties that process to a corresponding process on another node.
Conventionally, the interface between a particular node (hereinafter sometimes referred to as the "host node") and the network has been a single channel interface. That is, even when multiple virtual circuits (and thus multiple data streams) simultaneously interface with a particular node, all incoming and outgoing data packets for the host node are channeled through a single hardware connection at the interface between the network and the node. The multiplexing and demultiplexing of the data streams to and from the multiple processes running on the host node has conventionally been handled by the host node's operating system (i.e., kernel level software). Thus, communication ordinarily has been handled by hardware interrupts to the operating system whenever a data packet, associated with a particular process on the host node, is received, or whenever a process on the host node desires to transmit a data packet via a virtual circuit to another node.
Consequently, during periods of heavy communication traffic in a conventional node, a large amount of operating system bandwidth is spent handling I/O interrupts. Furthermore, such systems are inefficient in that they require the bottlenecking of multiple data streams through a single interface.
The recently-developed multi-virtual circuit network interface controller (NIC), greatly reduces the strain on operating system bandwidth and, during times of high data traffic, eliminates the bottlenecking of data through a single interface, by providing the illusion of a dedicated network interface to multiple processes running on a given node simultaneously. These multiple "virtual" interfaces allow data communications to occur directly between the multiple processes running on the host node and the NIC without the intervention of the operating system kernel on the host node. Each active process on the host node may have one or more associated virtual interfaces. However, each virtual interface can be owned by only a single process. These processes may be running in user or kernel level space within the node.
Each virtual interface of the NIC includes two work queues in host memory which provide a two-way communication link between the NIC and a process on the host node. One work queue is for inbound data (the receive queue) and the other for outbound data (the send queue).
Data packet flow via a particular virtual interface is governed by the use of a stream of control information passing from the process to the NIC. The control information is contained in packet descriptors. Each incoming and outgoing data packet for a particular process is associated with a packet descriptor. Each packet descriptor is a small data structure which describes a packet and provides control information for the handling of that packet.
A descriptor is generated by a process, and placed into one of the process's two virtual interface work queues. The send and receive queues of a virtual interface are thus comprised of linked packet descriptors, with each descriptor describing either a packet to be transmitted from the process via the NIC onto the network or a packet to be received by the NIC from the network and written into a buffer of the receiving process.
For transmission of a data packet, the process puts a packet descriptor into the transmit queue and then informs the NIC, via a memory-mapped "send doorbell," that a data packet is ready and awaiting transmission. The NIC then reads the descriptor out of the queue, determines from the descriptor the amount, location and destination of the data packet to be transmitted, and processes the packet accordingly.
When a process is ready to receive a data packet from its associated virtual circuit, the process places a buffer descriptor into the receive queue and rings a memory-mapped "receive doorbell" to inform the NIC the buffer is available for receiving a packet. The control information in the packet describes the location of a buffer in user or kernel level memory for storing the incoming packet. The process may queue up a number of descriptors on the receive queue at a time, with each descriptor containing control information for a single incoming data packet.
The NIC processes the descriptors out of a process's receive queue sequentially. Upon the reception of a data packet for the process, the NIC uses the information contained in the descriptor to control the writing of the packet into memory. The NIC then writes a completion notification into the descriptor, and repeats this process of reading descriptors and storing packets in accordance therewith as long as the process continues to generate receive queue descriptors.
The process, in turn, utilizes the completion notifications in the descriptors to determine whether there are received data packets awaiting processing by the process. The process polls the descriptors in the receive queue to see which have been "completed" (i.e., have become associated with received packets placed in memory by the NIC). Through the polling of descriptors, then, the processes on the host node are able to bypass the operating system kernel and communicate directly with the NIC, and through the NIC with their virtual circuits.
Descriptor polling is a highly efficient means of handling data communication during periods of high data traffic. However, when the data traffic for a particular process is slow, it is an inefficient use of processing capability to have a process repeatedly polling for naught. Under these circumstances, then, it is desired to have an alternative means of handling data communication, one which reduces processor bandwidth waste. In addition, it is desired to provide application processes and the host with notification of significant events, such as, e.g., the occurrences of various error conditions.
One means of reducing processor overhead during times of slow or sporadic data traffic is through the use of hardware interrupts. A process which has, e.g., polled the receive queue a number of times and come up empty-handed, may wish to go to sleep and be awakened upon the happening of an event, such as, e.g., when one or more data packet descriptors in the receive queue have been completed by the NIC. In addition, a process and/or the host may need to be notified about an error or other significant event that has occurred. Event notification in these sorts of circumstances can be accomplished through the use of a hardware interrupt to the operating system, which triggers the operating system to, e.g., awaken a process or notify an application process of an error.
However, given the potentially thousands of virtual interfaces which can be active simultaneously, the generation of an interrupt to the operating system upon the happening of every event could inundate the operating system with context switching due to interrupts. Conversely, during times of low communications traffic, requiring the CPU to poll for events can waste an excessive amount of CPU cycles and system memory bandwidth. Furthermore, in a multitasking operating system, the polling process can get swapped out and not get swapped back in for a relatively long period of time. Consequently, a significant event may have to wait an unacceptable period of time to get serviced. Thus, is it desired to provide a capability for event notification which does not either inundate the operating system with interrupts or waste excessive CPU time event-polling.
In a node architecture employing a multi-virtual circuit NIC, there are a number of event types which can arise and about which a particular process may wish to be notified. For example, a process may wish to be notified of events relating to specific packet descriptors or to its virtual circuit. In addition, there are possible events which are not specific to a particular virtual circuit, i.e., events relating to the NIC in general, about which the host needs to be made aware. Thus, it is also desired to provide a flexible event notification capability which allows the enabling and generation of event notification from various levels within the node hierarchy (i.e., from the descriptor, virtual circuit and/or NIC levels).