1. Field of the Invention
The present invention relates to computer systems and specifically to the releasing of locking mechanisms associated with computer systems.
2. Background Information
Computer architecture generally defines the functional operation, including the flow of information and control, among individual hardware units of a computer. One such hardware unit is the processor or processing engine, which contains arithmetic and logic processing circuits organized as a set of data paths. In some implementations, the data path circuits may be configured as a central processing unit (CPU) having operations that are defined by a set of instructions. The instructions are typically stored in an instruction memory and specify a set of hardware functions that are available on the CPU.
A high-performance computer may be realized by using a number of identical CPUs or processors to perform certain tasks in parallel. For a purely parallel multiprocessor architecture, each processor may have shared or private access to data, such as program instructions (e.g., algorithms), stored in a memory coupled to the processors. Access to an external memory is generally handled by a memory controller, which accepts memory requests from the various processors and processes them in an order that often is controlled by arbitration logic contained in the memory controller. Moreover, certain complex multiprocessor systems may employ many memory controllers where each controller is attached to a separate external memory subsystem.
One place where a parallel, multiprocessor architecture can be advantageously employed involves the area of data communications and, in particular, the processing engine for an intermediate network station or node. The intermediate node interconnects communication links and subnetworks of a computer network to enable the exchange of data between two or more software entities executing on hardware platforms, such as end nodes. The nodes typically communicate by exchanging discrete packets or frames of data according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP) or the Internetwork Packet Exchange (IPX) protocol.
When two processors in a multiprocessor system vie for access to a single shared memory resource a lock is employed that allows for orderly access to the shared resource. In this context, the lock is an abstraction representing permission to access the resource. For example, the lock may be configured to ensure that only one processor accesses a segment of the memory at any given time. Here, each segment of the memory may have a lock (e.g., a memory bit) associated with it and whenever a processor requires access to the segment, it determines whether the lock is “locked” or “unlocked.” A locked status indicates that another processor is currently accessing that segment of the memory. Conversely, an unlocked status indicates that the segment is available for access. Thus, when a processor attempts to access a memory segment, it simply tests the lock associated with the segment to determine whether that segment is currently being accessed. If not, the testing processor acquires the lock to exclude other processes from accessing the segment.
A typical sequence of instructions involving a lock is illustrated in FIG. 1. At line 104 a processor acquires a lock associated with memory locations “A” and “B.” It then performs a series of operations involving memory locations A and B, as indicated at lines 106 through 112, and releases, i.e., unlocks, the lock, as indicated at line 114.
Before a lock is released, a processor must ensure that all operations associated with the lock have completed. Thus, before the RELEASELOCK instruction at line 114 can release the lock, the instruction must ensure that all prior memory operations have completed. One previous technique that may be used to ensure such a result would be to serialize all the instructions, such that before an instruction can be executed, the previous instruction and all its associated memory operations must have completed. Thus, for example, before the “write” instruction at line 108 can be executed, the “read” instruction at line 106 and its associated memory operation, i.e., “read memory location A,” must be completed.
One drawback associated with this instruction serialization technique is that it is inefficient since memory operations are not performed in parallel. For example, assume that the instructions at lines 108 and 110 access memory locations controlled by different memory controllers. By serializing instruction execution and memory operation, the total time involved performing both memory operations will be at least the time it takes to complete the memory operation on location A plus the time it takes to complete the memory operation on location B. This total time is greater than the time needed to perform the operations in parallel, e.g., the time it takes to complete the longer of the two operations.
Another previous technique that could be used to ensure that all memory operations have completed before releasing a lock is to modify the RELEASELOCK instruction such that instruction execution stalls until all the memory operations have completed before proceeding to the instruction after the RELEASELOCK instruction, e.g., the instruction at line 116. Although certain memory operations may be performed in parallel, this previous technique forces operations following the RELEASELOCK instruction to stall until all prior memory operations have completed before instruction execution proceeds. As a result, a measure of performance improvement due to parallelization is lost, waiting for these operations to complete before instruction execution can proceed.