In communication networks, it is often advantageous for a client process to form queues of instructions that cause data messages to be exchanged with other processors or “nodes.” These lists of instructions may then be executed by hardware resources, a synchronously. The hardware may subsequently notify the client of completion of the operation. Problems may arise when instruction queues are destroyed before the client process has completed processing the operation completion notifications.
One such communication network is implemented according to the Infiniband™ Architecture Specification developed by the InfinibandSM Trade Association, the specification for which is incorporated herein by reference (Infiniband™ Architecture Specification, version 1.1). The Infiniband™ Architecture defines a system area network for connecting multiple independent processor platforms (i.e., host processor nodes), input/output (“IO”) platforms, and IO devices as is shown in FIG. 1. The system 100 is a communications and management infrastructure supporting both IO and interprocessor communications for one or more computer systems. The system 100 can range from a small server with one processor and a few IO devices to a massively parallel supercomputer installation with hundreds of processors and thousands of IO devices. Communication among nodes is accomplished according to an Infiniband™ protocol. In addition, the IP (Internet protocol) friendly nature of the architecture allows bridging to an Internet, intranet, or connection to remote computer systems 111.
The Infiniband™ architecture defines a switched communications fabric 101 allowing many devices to concurrently communicate with high bandwidth and low latency in a protected, remotely managed environment. The system 100 consists of processor nodes 102, 103, and 104 and 10 units 105, 106, 107, and 108 connected through the fabric 101. The fabric is made up of cascaded switches 109 and routers 110. IO units can range in complexity from a single attached device, such as a SCSI or LAN adapter to large memory rich RAID subsystems 107.
The foundation of Infiniband™ operation is the ability of a client process to queue up a set of instructions that hardware devices or nodes, such as a host channel adapter 112 (“HCA”), switch 109, or router 110 execute. This facility is referred to as a work queue. Work queues are always created in pairs consisting of a send work queue and a receive work queue. The send work queue holds instructions that cause data to be transferred between the client's memory and another process's memory. The receive work queue holds instructions about where to place data that is received from another process. Each node may provide a plurality of queue pairs, each of which provides independent virtual communication ports.
In the HCA queuing model: the client submits a work request, which causes an instruction called a work queue element (“WQE”) to be placed on the appropriate work queue. The channel adapter executes WQEs on a particular work queue in the order that the WQEs were placed on the particular work queue. When the channel adapter completes a WQE, a completion queue element (“CQE”) may be placed on a completion queue. A client may access the completion queue to determine if a work request has been completed. Each CQE specifies all the information necessary for a work completion, and either contains that information directly or points to other structures, such as the associated WQE, that contain the information. Further, one completion queue may receive CQEs associated with a plurality of work queues.
If a CQE points to an associated WQE for needed completion information, a problem may arise when a work queue is destroyed or reset. Since client processes submit work requests and retrieve completion information from a completion queue a synchronously, a work queue may be destroyed or reset before all of the completed CQEs have been processed. If a client process needs a WQE in a destroyed work queue for completion information to process a CQE, it may not be possible to perform orderly completion processing. This can lead to problems such as resource leakage.