1. Field of the Invention
The present invention relates to computer systems and, more specifically, to locking mechanisms associated with controlling access to resources in computer systems.
2. Background Information
Computer architecture generally defines the functional operation, including the flow of information and control, among individual hardware units of a computer. One such hardware unit is a processor or processing engine, which contains arithmetic and logic processing circuits organized as a set of data paths. In some implementations, the data path circuits may be configured as a central processing unit (CPU) having operations that are defined by a set of instructions. The instructions are typically stored in an instruction memory and specify a set of hardware functions that are available on the CPU.
A high-performance computer system may be realized by using a number of identical CPUs or processors to perform certain tasks in parallel. For a purely parallel multiprocessor architecture, each processor may have shared or private access to data, such as program instructions (e.g., algorithms), stored in a memory coupled to the processors. Access to an external memory is generally handled by a memory controller, which accepts requests from the various processors and processes them in an order that often is controlled by arbitration logic contained in the memory controller. Moreover, certain complex multiprocessor systems may employ many memory controllers where each controller is attached to a separate external memory subsystem.
One place where a parallel, multiprocessor architecture can be advantageously employed involves the area of data communications and, in particular, the processing engine for an intermediate network station or node. The intermediate network node interconnects communication links and subnetworks of a computer network to enable the exchange of data between two or more software entities executing on hardware platforms, such as end nodes. The nodes typically communicate by exchanging discrete packets or frames of data according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP) or the Internetwork Packet Exchange (IPX) protocol. Here, the processing engine may be arranged as a systolic array comprising identical processing elements or processors, where each processor in the array performs a fixed amount of work on the packet data within a fixed amount of time, by executing a distinct set of instructions on the data, before passing the data to the next processing element in the array. To further maximize throughput, the processor arrays may be replicated such that multiple processors execute the same set of instructions in parallel on different packets or frames of data and access the same shared resources, such as memory.
When two processors in a multiprocessor system vie for access to a single shared resource often a lock is employed that allows for orderly access to the shared resource. In this context, the lock is an abstraction representing permission to access the resource. For example, the lock may be configured to ensure that only one processor accesses a segment of memory at any given time. Here, each segment of the memory may have a lock (e.g., a memory bit) associated with it and whenever a processor requires access to the segment, it determines whether the lock is “locked” or “unlocked.” A locked status indicates that another processor is currently accessing that segment of the memory. Conversely, an unlocked status indicates that the segment is available for access. Thus, when a processor attempts to access a memory segment, it simply tests the lock associated with the segment to determine whether that segment is currently being accessed. If not, the testing processor acquires the lock to exclude other processors from accessing the segment.
One previous technique often used by processors to access a lock involves a code loop where the loop repeats until the lock is acquired. FIG. 1 is a listing of a typical sequence of instructions illustrating this technique. In this sequence, the lock is a bit at a memory location referenced (pointed to) by the contents of register R2. The instructions at lines 104 and 110 are executed to access (e.g., acquire) the lock, lines 114 and 116 comprise a “critical-code section” that manipulates a critical data structure contained in a memory location associated with the lock, and the instruction at line 118 releases the lock. A critical-code section is a section of code that must be executed atomically, i.e., without interference from other sources, to preserve data integrity, hardware integrity, and access serialization.
Specifically, at line 104 the processor attempts to acquire the lock and sets the value in register R1 to indicate whether or not the lock was acquired. At line 110, a conditional-branch instruction tests the value in register R1 to determine if the lock was acquired. The delay-slot instruction at line 112 is then executed. (The example assumes a delay-slot architecture, in which the instruction located immediately after a branch instruction is executed independently of the branch test's result.) If the lock was not acquired, the branch at line 110 is taken and execution then resumes at the top of the loop at line 104. Otherwise, execution resumes at line 114 where the processor performs the critical-code section, as indicated at lines 114 through 116, and then releases the lock, as indicated at line 118.
One drawback with above-described technique is that the order in which the processors attempt to acquire the lock is not preserved. Thus, it is possible to indefinitely prevent (“starve”) a processor from acquiring the lock. For example, assume processors A, B, and C execute the above-described code loop to acquire a lock. Further, assume processor A acquires the lock. Next, processor B attempts to acquire the lock but fails as the lock is being held by processor A. Processor A then releases the lock and shortly thereafter, processor C acquires it. Processor B, again, attempts to acquire the lock but fails. Next processor C releases the lock and shortly thereafter, processor A acquires it. Again, when processor B attempts to acquire lock it will fail because the lock is now held by processor A, and so on.
Another drawback with the above-described technique is that the processor continually executes the code loop until the lock is acquired. Thus, the processor is unable to perform other useful work, as it is busy acquiring the lock. Moreover, valuable memory bandwidth is wasted because an instruction that accesses memory, i.e., instruction 104, is executed in an attempt to acquire the lock even though the location may already be locked.