The present invention relates generally to computer architecture and, more particularly, to processor synchronization within a programmable arrayed processing engine.
Computer architecture generally defines the functional operation, including the flow of information and control, among individual hardware units of a computer. One such hardware unit is a processing engine that contains arithmetic and logic processing circuits organized as a set of data paths. In some implementations, the data path circuits may be configured as a processor having a register file of general-purpose registers for use with operations that are defined by a set of instructions. The instructions are typically stored in an instruction memory and specify a set of hardware functions that are available on the processor. When implementing these functions, the processor generally processes xe2x80x9ctransientxe2x80x9d data residing in a memory in accordance with the instructions.
A high-performance processing engine configured for use in, e.g., an intermediate network device may be realized by using a number of identical processors to perform certain tasks in parallel. For a parallel multiprocessor architecture, each processor may have shared access to information stored in a common resource, such as a memory. Processor synchronization denotes serialization of access to the shared resource. In a multiprocessor environment, processors that share data stored in the memory resource typically access that resource serially, rather than in parallel, in accordance with a processor synchronization mechanism to ensure that the data does not change from its expected state.
There are multiple mechanisms used for processor synchronization, most of which are based on the use of semaphores. A semaphore is a variable, such as a hardware or software flag, with a value that indicates the status of a common resource. To keep processors from interfering with one another, the semaphore may be used to lock the resource. In this context, the lock is an abstraction that represents permission to access the resource and, to that end, may be further viewed as a memory bit associated with the resource. If the bit is not asserted (xe2x80x9c0xe2x80x9d), the lock is free and if the bit is asserted (xe2x80x9c1xe2x80x9d), the lock is busy.
Lock (and unlock) requests are typically atomic in that they are implemented such that neither an interrupt nor a multiprocessor access affects the outcome. All processors that access a shared resource must obtain a lock that corresponds to that resource before manipulating its contents. A processor requesting the resource checks the lock to determine the resource""s status and then decides how to proceed. If the lock is already held by another processor, the requesting processor must defer its access until the lock becomes available.
For example, a hardware semaphore and a group of synchronization variables may be used to simultaneously request locking of multiple exclusive resources to avoid, e.g., a deadlock situation. A known implementation for achieving this function involves initially obtaining a lock that is used to guard the set of resource variables and then interrogating those variables. If the resources are available, the variables are marked as being in-use and lock is granted; otherwise, none of the variables are updated and the lock is released. A typical software approach to locking a shared resource involves disabling all interrupts and invoking an atomic sequence, such as setting a flag and reading the state of that flag (i.e., a xe2x80x9ctest-and-setxe2x80x9d operation).
The present invention relates to a group and virtual locking mechanism (GVLM) that addresses two classes of synchronization present in a system having resources that are shared by a plurality of processors: (1) synchronization of the multi-access shared resources; and (2) simultaneous requests for the shared resources. In the illustrative embodiment, the system is a programmable processing engine comprising an array of processor complex elements, each having a microcontroller (TMC) processor. The processor complexes are preferably arrayed as rows and columns. Broadly stated, the novel GVLM comprises a lock controller function associated with each column of processor complexes in cooperating relation with lock instructions executed by the TMC processors to thereby create a tightly integrated arrangement for generating lock requests directed to the shared resources.
Specifically, the GLVM merges a lock request and group information into a single instruction to reduce the time needed to obtain a group of locks. The lock request is then communicated to the lock controller. Notably, lock state variables used by the present invention reside in the lock controller as opposed to an external memory. This feature of the invention extends usable memory bandwidth by not requiring memory access cycles to obtain a lock and, further, significantly reduces latency associated with acquiring the lock.
In an aspect of the invention, a virtual semaphore mechanism is provided that allows multiple processors to access a shared resource, such as memory. Multiple processors can access the shared resource as long as each processor is accessing a different region of the resource. Therefore, the shared resource is partitioned into variable size regions and, according to the invention, each region is assigned a virtual sempahore identifier for use with the GVLM. The virtual identifier enables locking of a portion of the resource, rather than the entire resource. This, in turn, allows a plurality of processors to simultaneously access the shared resource, thereby increasing performance of the system.
According to the invention, the TMC instruction set provides a get virtual semaphore, xgvs, instruction that allows a processor to obtain a virtual lock. One virtual lock may be xe2x80x9cownedxe2x80x9d per processor, per group. However, the lock controller allows multiple locks to exist within one resource group as long as another processor does not own the virtual identifier and the entire group has not been locked via, e.g., a get binary semaphore, xgbs, instruction. All locks are cleared via a xcs instruction. In another aspect of the invention, a single processor may request locking of multiple shared resources (functions) at the same time by issuing either the xgbs or xgvs instruction and simultaneously specifying two functions to be locked. By allowing the instruction to specify a group of resources at the same time deadlock situations can be avoided.
Advantageously, the GVLM provides an efficient means to obtain locks for multiple exclusive resources or shared multi-access resources. The invention also enhances interprocessor synchronization for tightly coupled processors. The GVLM invention is efficient in terms of lock acquisition and release times, and also requires less memory bandwidth as compared to prior implementations.