1. Field of the Invention
The present invention broadly relates to computer systems. More particularly, the present invention relates to a messaging scheme to synchronize processes within a multiprocessing computing environment.
2. Description of the Related Art
Generally, personal computers (PCs) and other types of computer systems have been designed around a shared bus system for accessing a shared memory. One or more processors and one or more input/output (I/O) devices are coupled to the shared memory through the shared bus. The I/O devices may be coupled to the shared bus through an I/O bridge, which manages the transfer of information between the shared bus and the I/O devices. The processors are typically coupled to the shared bus either directly or through a cache hierarchy.
FIG. 1A illustrates a shared bus multiprocessor computer system 10 of the prior art. Three processors, 14A through 14C, are shown directly connected to the shared system bus 18. More processors may also be connected in similar fashion. The system memory 16 (i.e., the shared memory) is shown connected to the system bus 18. Each processor, 14A through 14C, may further have its own local cache, caches 12A through 12C respectively. As used herein, the term “task” refers to a sequence of instructions arranged to perform a particular operation. Application software being executed in the multiprocessing computer system 10 and the operating system for the computer system 10 may each comprise one or more tasks.
One problem that confronts a shared-memory multiprocessor is the ability to coordinate or synchronize processors that are working on a common task. Particularly, access to critical regions of memory 16 accessed by two or more processes must be controlled to provide consistent results in memory transactions. A critical region or a critical section of the memory 16 may contain global variables accessible by each processor in the system. Typically, the critical regions are protected by lock variables (or “semaphores”) to synchronize the processes using an atomic swap operation. In an atomic swap operation, a processor can both read a memory location and set it to the locked value in the same bus operation, preventing any other processor from reading or writing the shared system memory 16.
FIG. 1B illustrates a simplified flow diagram for locking one or more critical regions using an atomic swap instruction. In a shared bus system, e.g., the system in FIG. 1A, bus arbitration is relatively simple because the shared bus 18 is the only path to the system memory 16. Therefore, the processor that gets the bus may retain control of the bus, thereby locking out all other processors from the memory. When a processor wants to establish a lock, it first reads the lock variable to test its state. The processor keeps reading and testing until the value indicates that the lock is unlocked.
After detecting the state of the lock variable as unlocked, the processor that wishes to lock the variable attempts to do so by executing an appropriate instruction over the shared system bus 18. The instruction may be known as a “test-and-set” instruction in some instruction sets. A test-and-set instruction has the typical form of read-modify-write, in which the entire process is not interruptible by another processor attempting to read or write to the affected memory location. That is, once the test-and-set instruction is initiated and the Read access is completed, no other access can be made to the affected memory location until the location is rewritten during the second step (i.e., the “set” function) of the test-and-set instruction.
In an x86 architecture, a processor may lock the shared system bus 18 using the LOCK prefix in front of an instruction. When an instruction with a LOCK prefix executes, the processor will assert its bus lock signal output. This signal may be connected to an external bus controller (not shown), which then prevents any other processor from taking over the system bus. Thus, a number of shared system resources, e.g., the system memory 16, a disk drive (not shown), etc. may be dedicated to a single processor during execution of the operation affecting the shared system resource.
Generally, the skeleton for a program to update a critical region may be given as: LOCK (critical_region); Access (critical_region); UNLOCK(critical_region). A flag or a semaphore may be associated with the critical region. As mentioned earlier, the critical region may typically include memory locations containing shared data, data structure or lock variables. The LOCK and UNLOCK statements operate on the semaphore of the critical region rather than on the content of the critical region. The semaphore permits no more than one process at a time to have access to the critical region. If process A executes the LOCK statement successfully, then all other processes (that require accesses to shared system resources) within the computer system 10 must be halted until process A executes the UNLOCK statement. The LOCK statement can be implemented in part with a test-and-set instruction.
The synchronization of accesses to shared system resources is accomplished by serializing concurrent executions of LOCK instructions by more than one process. Due to the serial execution of LOCK, no more than one process may observe a zero value (the reset condition) of the semaphore and thereby move past the LOCK to the update stage. Thus, as shown in FIG. 1B, the requesting processor may continue its attempts to lock the variable so long as the semaphore is set (by another processor). When one process passes the LOCK and reaches the UNLOCK, the semaphore can be returned to a 0 state (i.e., the reset condition) and thereby permit another process (which may be executed on another processor) to pass the LOCK statement and update the shared variable.
Once a process (through the corresponding processor) successfully establishes the lock, i.e., succeeds in locking the critical region, that process then operates on the critical region. Upon completion of operation on the critical region, the process unlocks the critical region, for example, by resetting the associated semaphores. This allows the next process to establish lock ownership and similarly continue lock operations over the lock variables.
Unfortunately, shared bus systems suffer from several drawbacks. For example, since multiple devices are attached to the shared bus, the bus is typically operated at a relatively low frequency. Further, a shared system bus may not be scaled to include a large number of devices because of the fixed bus bandwidth. Once the bandwidth requirements of the devices attached to the shared bus (either directly or indirectly) exceeds the available bandwidth of the shared bus, devices will frequently be stalled when attempting to access the bus. This results in overall decrease in the system performance.
One or more of the above problems may be addressed using a distributed memory system. In a distributed memory multiprocessing computer system, the shared physical system memory 16 (FIG. 1A) of the prior art may instead be distributed among the processing nodes. Further, the dedicated system bus 18 (FIG. 1A) of prior art may be absent in such a multiprocessing environment. Therefore, it is desirable to provide a mechanism to determine which process receives the lock so as to synchronize processes within the system without restricting the scalability of the system.