The present invention relates to information processing systems, and more particularly to a structure and method for queuing events associated with an information processing system.
InfiniBand architecture is followed in many system area networks for connecting servers together to other servers and remote storage subsystems. The architecture is based on a serial switch fabric that resolves scalability, expandability, and fault tolerance limitations of a shared bus architecture through the switches and routers of the switch fabric.
InfiniBand architecture is implemented where appropriate for providing better performance and scalability at a lower cost, lower latency, improved usability and reliability. The architecture addresses reliability by creating multiple redundant paths between nodes. It represents a shift from the load-and-store-based communications methods using shared local I/O busses of the past to a more fault tolerant message passing approach.
FIG. 1 is a prior art diagram illustrating a system area network utilizing Infiniband network architecture. The method shown in FIG. 1 utilizes system clustering in which two or more servers are connected together as a logical server at each processor node 110 for better performance and sociability at lower cost. As further shown in FIG. 1, the network has a switch fabric 125 including three switches 120 which connect processor nodes 110 and input output (I/O) subsystem nodes 130 together. As further shown in detail in FIG. 2, two processor nodes 200 and 201 function as processor nodes 110 (FIG. 1) of a server area network. The processor nodes support consumers 150 and each includes a network adapter, e.g., a host channel adapter (“HCA”) 140, one or more queue pairs (“QPs”) 210 or 211 and one or more ports 260 or 261 for communication. A consumer can be defined as a user of verbs. Verbs are abstract descriptions of channel adapter behavior. I/O subsystem nodes 130 contain target channel adapters (“TCAs”) 150. Like processor nodes, each I/O subsystem node also includes one or more QPs and one or more ports and I/O controllers (not shown).
Each processor node 110 and each I/O subsystem node 130 connects to the switch fabric 125 through its respective HCA or TCA. Host and target channel adapters provide network interface services to overlying layers to allow such layers to generate and consume messages which include packets, as well as other types of communications. When an application running on a processor node writes a file to a storage device, the HCA 140 of that processor node generates the packets that are then consumed by a storage device at one of the I/O subsystem nodes 130. Between the processor node and the I/O subsystem nodes, switches 120 route packets through the switch fabric 125. Switches 120 operate by forwarding packets between two of the switch's ports according to an established routing table and based on addressing information in the packets.
FIG. 2 is a prior art diagram further illustrating principles of communications in accordance with InfiniBand architecture. An application active on a first processor node 200 may require communication with another application which is active on a second processor node 201 remote from the first processor node 200. To communicate with the remote application, the applications on both processor nodes 200, 201 use work queues. Each work queue is implemented by a pair of queues, i.e., a “queue pair” (“QP”) 210 or 211, which includes a send work queue and a receive work queue.
An application drives a communication operation by placing a work queue element (WQE) in the work queue. From the work queue, the communication operation is handled by the HCA. Thus, the work queue provides a communications medium between applications and the HCA, relieving the operating system from having to deal with this responsibility. Each application may create one or more work queues for the purpose of communicating with other applications or other elements of the system area network.
Event handling is a major part of controlling communications between nodes. The manner in which event handling is performed ultimately affects both the scalability and performance of a system area network. The InfiniBand architecture specification describes concepts of events and event records for reporting certain types of events and errors. In networks having few resources, the problem of determining which system resource causes a call to an event handler such as a completion event handler or an error event handler is trivial. Both of the event handlers could be called when any system resources has an event and then the event handlers would scan all of the system resources to determine which of the system resources generated the event. While suitable for networks having few resources, this approach does not scale up well for networks having large numbers of system resources.
On the other hand, in networks involving large numbers of system resources, merely maintaining a centralized event table in main memory for each of the thousands of resources of the system does not lead to satisfactory results. Such table would have to be very large in order to allow events to be recorded therein for all of the resources of the system. Typically, entries can be no smaller than a byte since this is the smallest atomic memory operation in most large servers, in that memory operations which set individual bits are not supported. When a resource has an event, the HCA would write the corresponding byte in the event table indicating that the resource has pending work, and then the handler would be called. Accordingly, scan times would still be long. In addition, if a particular resource has one or more active entries in the table at one time for the same type of event, unnecessary double-handling of the same event could result, unless the event handler is required to scan the entire table first to determine all instances of the same type of event involving the same resource before proceeding. Applying this approach to a system having thousands of resources involves a large amount of time to search the table for events and would be an ineffective and undesirable way of handing the problem. Event queues used in particular implementations of InfiniBand architecture allow scaling to large numbers of resources by eliminating the need to scan individual resources of a system. However, event queue designs at present fail to adequately curtail the amount of time required for event handlers to search them.
Consequently, a new way of handling event records is desirable that can optimize scalability without impacting processing time.