This invention relates generally to computer systems and more specifically to multiprocessor computer systems.
As it is known in the art a computer system typically includes a microprocessor (CPU), a main memory system, a cache system, and an I/O system, generally interconnected by system address and data busses.
A multiprocessor system is one in which more than one CPU is used to increase overall system performance. Each CPU is given enough local resources to operate independently from the larger system. These resources are sufficient for operation of the CPU but generally do not include I/O device controllers, main memory, and other resources that are shared between all CPUs. For example, a CPU typically spends a small percentage of time reading data from the floppy drive. The floppy controller can therefore be shared between many CPUs allowing one to use the device while others are not. This makes the device productive during periods when it would have otherwise been unused if dedicated to a single CPU system, and increases the amount of usable work that the controller performs.
Multiprocessor implementations greatly improve system performance because each CPU can work on different portions of a single problem. One CPU can be reading data from an I/O port, manipulating it as required, and outputting it to main memory while a second CPU reads the data from main memory for performing its portion of the problem. In this manner, overall system performance increases because the processing rate is increased.
The sharing of system resources between a plurality of central processing units leads to a problem when more than one CPU attempts to change data resident in a single shared location. In a typical problem case, a first CPU attempting to do a read-modify-write operation reads the shared location, modifies the data and attempts to write it back to the same location. Just before issuing the write instruction a second CPU also begins to perform a read-modify-write operation to the same location by reading the data. If the first CPU successfully writes the data into the shared location, the data that the second CPU is working with is no longer current. If, after manipulating the data, the second CPU writes the data back to memory, that memory location now stores the results of the second CPU's operations and not the first. This could be viewed as corrupting the common location which could lead to catastrophic failure in the computer system.
Prior art systems have implemented a solution to this problem through the use of a semaphore. A semaphore is a software mechanism or construct. In this approach a semaphore is obtained by a CPU performing a read-modify-write operation by first reading the location in memory where the semaphore is stored. The CPU concurrently sets a flag (i.e., semaphore lock flag) in a semaphore address lock register and stores the address being modified. This address lock register and lock flag are a CPU resource used to indicate that an operation on the semaphore is in progress.
This approach does not hamper other CPUs from reading the semaphore location, however each device that attempts to write the location is required to verify that the lock flag has remained set before the write can be successful. If set, that device is allowed to write the location and clear its lock flag. Other CPUs in the system monitor the bus and clear their lock flags set for that address when the write executes. When, upon an attempted write, a CPU finds that its flag has been cleared it is required to repeat the whole operation until it is permitted to write the data successfully.
The problem with this solution is that since the address lock register is implemented as a resource in the logic which interfaces the CPU to the system bus, additional processor overhead time is incurred. In such an architecture, each time a CPU needs to write a location in shared main memory using a semaphore, the CPU signals the system interface logic before gaining access to the register. It also has to wait for this interface logic to send pending cache coherency transactions. Therefore the CPU must wait for an acknowledgment to be sent back from the register indicating that the semaphore flag was successfully set. Because of the lengthy overhead time involved in interacting with the system bus interface logic and the time it takes to return the acknowledgment, a significant amount of time during each transaction is wasted. Further, once the semaphore lock flag is set, the conditional write operation executes at the relatively slow clock speed of the system bus.
In prior art systems, bus writes were not forwarded to the CPU unless they would change the status of the cache memory system. Because of this, the semaphore address lock and lock flag register had to be disposed in the system bus interface so that the lock flag would be properly cleared when a write executed for the location whose address was stored in the address lock register. Otherwise writes which were not forwarded to the CPU could cause the lock flag to improperly remain set.