1. Field of the Invention
In general, the present invention relates to computer architecture, processor architecture, operating systems, input/output (I/O), concurrent processes, and queues. In particular, the present invention relates to moving, resizing, and memory management for producer-consumer queues.
2. Description of Related Art
The InfiniBand™ Architecture Specification is a standard that is available from the InfiniBand® Trade Association that defines a single unified I/O fabric. An overview of the InfiniBand™ architecture (IBA) is shown in FIG. 1.
As shown in FIG. 1, IBA defines a System Area Network (SAN) 100 for connecting multiple independent processor platforms, i.e., host processor nodes 102, I/O platforms, and I/O devices. The IBA SAN 100 is a communications and management infrastructure supporting both I/O and interprocessor communications (IPC) for one or more computer systems. An IBA system can range from a small server with one processor and a few I/O devices to a massively parallel supercomputer installation with hundreds of processors and thousands of I/O devices. Furthermore, the internet protocol (IP) friendly nature of IBA allows bridging to an internet, intranet, or connection to remote computer systems.
IBA defines a switched communications fabric 104 allowing many devices to concurrently communicate with high bandwidth and low latency in a protected, remotely managed environment. An end node can communicate over multiple IBA ports and can utilize multiple paths through the IBA fabric 104. The multiplicity of IBA ports and paths through the network are exploited for both fault tolerance and increased data transfer bandwidth.
IBA hardware offloads from the CPU much of the I/O communications operation. This allows multiple concurrent communications without the traditional overhead associated with communicating protocols. The IBA SAN 100 provides its I/O and IPC clients zero processor-copy data transfers, with no kernel involvement, and uses hardware to provide highly reliable, fault tolerant communications. An IBA SAN 100 consists of processor nodes 102 and I/O units connected through an IBA fabric 104 made up of cascaded switches 106 and routers 108. I/O units can range in complexity from single application-specific integrated circuit (ASIC) IBA attached devices such as a small computer system interface (SCSI) 110 or LAN adapter to large memory rich redundant array of independent disks (RAID) subsystems 112 that rival a processor node 102 in complexity.
In the example in FIG. 1, each processor node 102 includes central processor units (CPUs) 114, memory 116, and Host Channel Adapters (HCAs) 118. The RAID subsystem 112 includes storage 120, SCSIs 110, a processor 122, memory 116, and a Target Channel Adapter (TCA) 124. The fabric 104 is in communication with other IB subnets 126, wide area networks (WANs) 128, local area networks (LANs) 130, and processor nodes 132 via a router 108. The fabric provides access to a console 134, a storage subsystem 136 having a controller 138 and storage 120, and I/O chassis 140. One I/O chassis 140 include SCSI 110, Ethernet 142, fibre channel (FC) hub and FC devices 144, graphics 146, and video 148. Another I/O chassis 140 includes a number of I/O modules 150 having TCAs 124.
FIG. 2 shows a consumer queuing model in the standard. The foundation of IBA operation is the ability of a consumer 200 to queue up a set of instructions executed by hardware 202, such as an HCA. This facility is referred to as a work queue 204. Work queues 204 are always created in pairs, called a Queue Pair (QP), one for send operations and one for receive operations. In general, the send work queue holds instructions that cause data to be transferred between the consumer's memory and another consumer's memory, and the receive work queue holds instructions about where to place data that is received from another consumer. The other consumer is referred to as a remote consumer even though it might be located on the same node.
The consumer 200 submits a work request (WR) 206, which causes an instruction called a Work Queue Element (WQE) 208 to be placed on the appropriate work queue 204. The hardware 202 executes WQEs 208 in the order that they were placed on the work queue 204. When the hardware 202 completes a WQE 208, a Completion Queue Element (CQE) 210 is placed on a completion queue 212. Each CQE 210 specifies all the information necessary for a work completion 214, and either contains that information directly or points to other structures, for example, the associated WQE 208, that contain the information. Each consumer 200 may have its own set of work queues 204, each pair of work queues 204 is independent from the others. Each consumer 200 creates one or more Completion Queues (CQs) 212 and associates each send and receive queue to a particular completion queue 212. It is not necessary that both the send and receive queue of a work queue pair use the same completion queue 212. Because some work queues 204 require an acknowledgment from the remote node and some WQEs 208 use multiple packets to transfer the data, the hardware 202 can have multiple WQEs 208 in progress at the same time, even from the same work queue 204. CQs 212 inform a consumer 200 when a work request 206 has completed.
Event Queues (EQs) can be used to inform the consumer 200 of numerous conditions such as posting completion events, state change events, and error events that are defined in the standard. There is a need for event queues (EQs) for events that are defined in the standard. The CQs 212 and EQs need to be resized to be made larger or smaller than the original size, while WQEs 208 are outstanding and the number of resources (e.g., queue pairs (QPs), memory regions, CQs 212, physical ports) referencing the queues is changing. Additionally, there is a need to support logical partition (LPAR) memory reconfiguration and node evacuation for concurrent node replacement and repair. A logical partition (LPAR) is the division of a computer's processors, memory, and storage into multiple sets of resources so that each set of resources can be operated independently with its own operating system instance and applications.
There are many applications for producer-consumer queues, including video streaming and banking applications, such as automatic teller machines (ATMs). In video streaming applications, the producer is putting portions of video on the queue for the consumer to read and display. For ATM applications, the producer is putting transactions on the queue for the consumer (bank) to process and apply to an account. In an on demand environment, it may be desirable to move and/or resize the queue without stopping all activity, when, for example, workload balancing or workload management requires it. It is important that no data is lost or accidentally repeated during a move and/or resize operation, resulting in, for example, repeating or skipping portions of video or losing or double counting ATM transactions. Traditionally, all activity is stopped for producer-consumer queues for move and/or resize operations. However, for an on demand business that is working 24/7 and “always on”, this may not be acceptable.