1. Field of the Invention
The present invention broadly relates to computer systems, and more particularly, to a messaging scheme to synchronize processes within a multiprocessing computing environment.
2. Description of the Related Art
Generally, personal computers (PCs) and other types of computer systems have been designed around a shared bus system for accessing a shared memory. One or more processors and one or more input/output (I/O) devices are coupled to the shared memory through the shared bus. The I/O devices may be coupled to the shared bus through an I/O bridge, which manages the transfer of information between the shared bus and the I/O devices. The processors are typically coupled directly to the shared bus or through a cache hierarchy.
FIG. 1A illustrates a shared bus multiprocessor computer system 10 of the prior art. Three processors, 14A through 14C, are shown directly connected to the shared system bus 18. More processors may also be connected in the similar fashion. The system memory 16 (i.e., the shared memory) is shown connected to the system bus 18. Each processor, 14A through 14C, may further have its own local cache, caches 12A through 12C respectively. As used herein, the term xe2x80x9ctaskxe2x80x9d refers to a sequence of instructions arranged to perform a particular operation. Application software being executed in the multiprocessing computer system 10 and the operating system for the computer system 10 may each comprise one or more tasks.
One of the major requirements of a shared-memory multiprocessor is being able to coordinate or synchronize processes that are working on a common task. Particularly, access to critical regions of memory 16 accessed by two or more processes must be controlled to provide consistent results in memory transactions. A critical region or a critical section of the memory 16 may contain global variables accessible by each processor in the system. Typically, the critical regions are protected by lock variables (or xe2x80x9csemaphoresxe2x80x9d) to synchronize the processes using an atomic swap operation. In an atomic swap operation a processor can both read a memory location and set it to the locked value in the same bus operation, preventing any other processor from reading or writing the shared system memory 16.
FIG. 1B illustrates a simplified flow diagram for locking one or more critical regions using an atomic swap instruction. In a shared bus system, e.g., the system in FIG. 1A, bus arbitration is relatively simple because the shared bus 18 is the only path to the system memory 16. Therefore, the processor that gets the bus may retain control of the bus, thereby locking out all other processors from the memory. When a processor wants to establish a lock, it first reads the lock variable to test its state. The processor keeps reading and testing until the value indicates that the lock is unlocked.
After detecting the state of the lock variable as unlocked, the processor that wishes to lock the variable attempts to do so by executing an appropriate instruction over the shared system bus 18. The instruction may be known as a xe2x80x9ctest and setxe2x80x9d instruction in some instruction sets. A test-and-set instruction has the typical form of read-modify-write, in which the entire process is not interruptible by another processor reading or writing the affected memory location. That is, once it is initiated and the Read access is completed, no other access can be made to the operand until the operand is rewritten during the second step (i.e., the xe2x80x9csetxe2x80x9d function) of the test-and-set.
In an x86 architecture, a processor may lock the shared system bus 18 using the LOCK prefix in front of an instruction. When an instruction with a LOCK prefix executes, the processor will assert its bus lock signal output. This signal may be connected to an external bus controller (not shown), which then prevents any other processor from taking over the system bus. Thus, a number of shared system resources, e.g., the system memory 16, a disk drive (not shown), etc. may be dedicated to a single processor during execution of the critical code section.
Generally, the skeleton for a program to update a critical region may be given as: LOCK (critical_region); Access (critical_region); UNLOCK(critical_region). A flag or a semaphore may be associated with the critical region. As mentioned earlier, the critical region may typically include memory locations containing shared data, data structure or lock variables. The LOCK and UNLOCK statements operate on the semaphore of the critical region rather than on the content of the critical region. The semaphore permits no more than one process at a time to have access to the critical region. If process A executes the LOCK statement successfully, then all other processes (that require accesses to shared system resources) within the computer system 10 must be halted until process A executes the UNLOCK statement. The LOCK statement can be implemented in part with a test-and-set instruction.
The synchronization of accesses to shared system resources is accomplished by serializing concurrent executions of LOCK instructions by more than one processes. Due to the serial execution of LOCK, no more than one process may observe a zero value (the reset condition) of the semaphore and thereby move past the LOCK to the update stage. Thus, as shown in FIG. 1B, the requesting processor may continue its attempts to lock the variable so long as the semaphore is set (by another processor). When one process passes the LOCK and reaches the UNLOCK, the semaphore can be returned to a 0 state (i.e., the reset condition) and thereby permit another process (which may be executed on another processor) to pass the LOCK statement and update the shared variable.
Once a process (through the corresponding processor) successfully establishes the lock, i.e., succeeds in locking the critical region, that process then operates on the critical region. Upon completion of operation on the critical region, the process unlocks the critical region, for example, by resetting the associated semaphores. This allows the next processor to establish lock ownership and similarly continue lock operations over the lock variables.
Unfortunately, shared bus systems suffer from several drawbacks. For example, since there are multiple devices attached to the shared bus, the bus is typically operated at a relatively low frequency. Further, a shared system bus may not be scaled to include a large number of devices because of the fixed bus bandwidth. Once the bandwidth requirements of the devices attached to the shared bus (either directly or indirectly) exceeds the available bandwidth of the shared bus, devices will frequently be stalled when attempting to access the bus. This results in overall decrease in the system performance.
One or more of the above problems may be addressed using a distributed memory system. In a distributed memory multiprocessing computer system, the shared physical system memory 16 (FIG. 1A) of the prior art may instead get distributed among the processing nodes. Further, the dedicated system bus 18 (FIG. 1A) of prior art may be absent in such a multiprocessing environment. Therefore, it is desirable to provide a mechanism to decide which processor gets the lock so as to synchronize processes within the system without restricting the scalability of the system.
The problems outlined above are in large part solved by a multiprocessing computer system as described herein. The computer system may employ a distributed system memory and may further include multiple processing nodes. Two or more of the processing nodes may be coupled to separate memories that may form a distributed memory system. The processing nodes may be interconnected using any suitable interconnect. The memory address space is assigned across the memories associated with each node.
In one embodiment, acquisition and release of a lock is arbitrated by a single processing node from the plurality of processing nodes in the multiprocessing system. A first processing node transmits a lock request to a second processing node, which arbitrates such lock requests from each processing node within the system. The second processing node, in turn, determines whether the received lock request is ready for service and, if so, issues a broadcast message to all the remaining processing nodes within the system. The broadcast message thus serves to inform each remaining processing node of the decision by the second processing node to place the lock request from the first processing node into service.
In response to the broadcast message, each remaining processing node sends a target done message to the second processing node (i.e., the arbitrating node) when ready to free all the shared system resources for access by the lock requesting node, i.e., the first processing node. The second processing node, in turn, informs the first processing node of availability of lock ownership by transmitting another target done message to the first processing node.
After completion of lock operations, the first processing node transmits a lock release request to the second processing node. The second processing node again sends a broadcast message to each remaining processing node in the system to inform them of the completion of current lock operations. Each remaining processing node responds to the broadcast message by transmitting a corresponding target done message to the second processing node as an acknowledgment of the receipt of the broadcast message. The protocol is completed when the second processing node sends another target done message to the first processing node after receiving all target done messages from other processing nodes in the system. The messaging scheme according to present invention allows for contention and deadlock free locking within the distributed memory multiprocessing computer system.
In another embodiment, lock functionality is implemented through microcode routines stored within each processing node in the system. Each processing node in the system is capable (through its microcode) of monitoring the arbitrating processing node (here, the second processing node) for availability of lock. A lock register is provided in the second processing node to store the node identification data for a single processing node. A valid bit is also provided in the lock register to indicate whether the node identification data therein is valid or not. The valid bit, when set, may indicate to other processing nodes that the processing node represented by the identification data in the lock register is the current owner of the lock.
The first processing node, i.e., the node desiring lock ownership, sends its node identification data to the second processing node. A buffer may be provided in the second processing node to queue, in a chronological order, identification data received from each lock requesting node. In the event of no pending lock requests, the identification data for the first processing node may be directly placed in the lock register. The first processing node iteratively reads the content of the lock register to ascertain whether its node identification data is in the lock register and whether that data is valid (as indicated by the valid bit). Upon finding its node identification data in the lock register and upon ascertaining its validity, the first processing node transmits a broadcast message to all the other nodes in the system.
Each processing node in the system may be provided with a release request bit. The broadcast message instructs each remaining node to set the corresponding release request bit, thereby informing that node of the claim of lock ownership by the first processing node. A release response bit may also be provided within each remaining processing node in the system. Each remaining processing node sets the release response bit when ready to release its portion of the shared system resources. The first processing node checks status of the release response bit in each remaining processing node, thereby ascertaining when all remaining processing nodes have released the shared system resources. The first processing node, then, proceeds with its lock operations.
In an alternative embodiment, instead of the first processing node ascertaining the status of the release response bit in all the remaining processing nodes, the microcode within each remaining processing node is configured to cause the corresponding remaining processing node to write its identification information into a lock resource register within the first processing node when the remaining node is ready to release shared system resources. When each remaining node writes into the lock resource register, the first processing node gets an indication that it can now proceed with lock operations.
Upon completion of lock operations, the first processing node sends a write command to the second processing node instructing the second processing node to reset the valid bit in the lock register. The first processing node also transmits a broadcast message to all the remaining nodes in the system instructing them to reset respective release request bits. Thus the broadcast message serves to inform all the remaining processing nodes in the system of the completion of lock operations, and, hence, of release of the lock. In one embodiment, the remaining processing nodes respond to the broadcast message by sending a corresponding target done message to the first processing node. In still another embodiment, each remaining processing node may also reset the release response bit along with the release request bit in response to the broadcast message from the first processing node.
The foregoing messaging schemes implement lock functionality in a distributed memory multiprocessor computer system. Contentions for system resources are eliminated through a processing node status reporting mechanism. Microcode implementation of lock acquisition and release may reduce lock management hardware within the multiprocessing environment.