The invention relates generally to the field of computing and specifically to communications between queuing and de-queuing processes in high-throughput systems
Efficient queuing mechanisms are vital for the operation of today""s high-speed, high-throughput computing systems. This is especially true for such systems that are used in networking environments where arriving queue elements can number in the thousands and tens of thousands per second. In a simple single processor, multi-tasking system, a typical queuing mechanism consists of one or more queuing processes that can add elements to a queue, and one de-queuing process that removes elements from the queue for processing. Even in such a system, because elements are typically queued while enabled for interruptions, it is necessary to serialize queuing and de-queuing operations to avoid interference between different queuing processes and between these processes and the de-queuing process. In multi-processor systems involving shared queues, the serialization problem becomes even more acute, because different processors may be attempting to access the same shared memory at the same time. The serialization problem and the need for high-speed, high-throughput systems have resulted in the development of specialized computer instructions that perform queue and de-queue operations in a serialized manner, all within the operation of a single computer instruction. For example, the IBM System 390 architecture provides a Perform Locked Operation (PLO) instruction. This instruction has several functions, including Compare and Load, Compare and Swap, Compare, Swap and Store. Within these functions are additional variations, such as single or double memory word operations. Briefly, the PLO instruction operates as follows. A program lock token (PLT), which identifies a shared memory location, is placed in general register 1. A PLO function code that specifies the specific PLO function is placed in general register 0. The PLO instruction is then executed, which locks the shared memory location identified by the PLT and then performs the specified PLO operations on the shared memory locations. The PLT allows locking and lock recognition between different processes in a single CPU and between different CPUs. FIG. 1 shows the configuration of a typical System/390 mainframe system 100, containing multiple CPUs 102, 104, 106 and 108, all of which have access to a shared memory 110. The shared memory 110 may contain a queue structure, such as illustrated in FIG. 2. A typical queue structure consists of a queue 200 that contains the data elements in the queue, and a queue header 202. The queue header typically contains a head pointer 210 (QHEAD) that points to the top of the queue and a tail pointer 212 (QTAIL) that points to the bottom of the queue. Arriving data elements are typically placed at the shared memory location pointed to by the QTAIL pointer and data elements are serviced from the queue by removing them from the shared memory location pointed to by the QHEAD pointer. In the illustrative example of FIG. 2, the queue header also contains a sequence number (SEQ. #) field 204 and a COUNT field 208. The SEQ # field contains a numerical value that is incremented each time an operation is performed on the queue header. The COUNT field contains a numerical value of the number of data elements in the queue at any given time.
FIG. 3 shows a temporary memory structure 300 (called a parameter list) that is used by the PLO instruction. The parameter list 300 contains a OP1CV field 302 (operand 1 compare value), a OP1RV field 304 (operand 1 replacement value) and two other fields OP3306 and OP4308, which are used by the PLO. The parameter list contains other parameters not shown in FIG. 3. However, for the purpose of this disclosure, it is unnecessary to discuss these other parameter list words. Reference is made to the IBM document in the next paragraph for complete details.
A typical application of the PLO instruction is to administer queues. Because the PLO instruction is conventional, its detailed operation is not described here. A complete description of the instruction is contained in IBM Document Number SA22-720 1-xx, where xx represents a version number. The present version is xx=04. This document is incorporated by reference herein.
A simplified operation of the PLO instruction is now described with reference to a function of operation (function code 13) that assumes that a data element is to be inserted into an empty queue. Skilled art workers are capable of selecting other modes of operation of the PLO instruction for other situations of the queue. Reference is made to the above IBM document SA22-720 1 for this purpose. Single word operations of PLO are also ignored. For purposes here, it can be assumed that all of the operations mentioned are double word operations. PLO13 is referred to as the Compare and Swap and Store operation. To insert a new data element into an empty queue 200, OP1CV is set to the present value of Header 0 and both the left and right word portions of parameter list double word OP3 is set to the address in shared memory 110 where the new element to be queued is located. OP4 in the parameter list is set to the address of Header 1. The SEQ # and COUNT fields in the replacement value word OP1RV of the parameter list are set to their corresponding OP1CV values incremented by 1, and finally general register 1 is also set to the value of the PLT (the program lock token) corresponding to the portion of shared memory that contains the queue and queue header. The PLO instruction is executed. The queue and queue header portions of shared memory are locked as a result of the PLT in general register 1. Next, OP1CV is compared to HEADER 0. If they match, this means that the queue and header were not changed by another process during the setup process just prior to the execution of PLO13. This means that it is safe to proceed to modify the queue structure. In this event, Header 0 in the queue header is set equal to OP1RV and Header 1 is set equal to OP3. These operations update the queue structure so that the new element is now inserted. The PLT lock is released and execution of the PLO instruction is complete. If the comparison of OP1CV and HEADER 0 did not match, the above process would repeat until such time as these values match, at which time the new data element would be inserted as described.
The above described conventional procedure allows for the convenient administration of a queue. However, it does not provide any mechanism for communicating between the queuing process and a dequeuing process. For example, for efficient operation, it may be desirable to turn off a dequeuing process when a queue is empty and to turn it back on when necessary. This can be done separately with additional code or, in the IBM System 390 architecture, by activating an SRB (schedule resource block) process when a data element is inserted into a queue. Both methods are inefficient and undesirable. In the former case, additional code execution consumes precious CPU cycles in a high volume, high through-put system. In the latter case, each SRB that is activated initiates another dequeuing process, which consumes CPU resources and which may be unnecessary
The invention solves the problem of improving efficient communications between a queuing process and a dequeuing process. The queue structure includes a synchronization field that is used to pass information between the queuing process and the dequeuing process. An element is linked into the queue by storing into a parameter list information representing the present state of the queue, including the synchronization field, sufficient to determine if the queue changes and by then locking the queue to other processes. While the queue is locked, the state of the queue and the synchronization field are compared to the state of the queue and synchronization field contained in the parameter list. If the comparison matches, the element is linked into the queue and the queue is unlocked. If an element is successfully linked into the queue, then information from a dequeuing process contained in the synchronization field is obtained and acted on. An element is unlinked from the queue by a dequeuing process by storing in the parameter list the same queue information and synchronization field sufficient to determine if the queue changes and then locking the queue to other processes. The state of the queue and the synchronization field after locking are compared to the state of the queue and synchronization field contained in the parameter list. If the comparison matches, an element is unlinked from the queue. The queue is then unlocked. If an element is successfully unlinked from the queue, then the dequeuing process writes information into the synchronization field for the queuing process to act on.
In the preferred and disclosed embodiment, a special locking instruction is used to perform the linking and unlinking operations, each in a single locking instruction. This further improves the efficiency of communicating between the queuing and dequeuing processes. Also in the preferred embodiment, the information passed between the queuing and dequeuing processes allows the processes to control the initiation and termination of dequeuing processes. When the dequeuing process removes an element from the queue, the dequeuing process sets the synchronization field to a first or second state, depending on whether the queue is empty or not empty, respectively. If the queue is empty, the dequeuing process also terminates itself. When the queuing process next links an element into the queue, it reads the synchronization field as part of the serialized queuing operation. If the synchronization field is set to the first state, the queuing process initiates a new dequeuing process.