A computer system can be broken into three basic blocks: a central processing unit (CPU), memory, and input/output (I/O) units. These blocks are interconnected by means of a bus. An input device such as a keyboard, mouse, disk drive, analog-to-digital converter, etc., is used to input instructions and data to the computer system via the I/O unit. These instructions and data can be stored in memory. The CPU retrieves the data stored in the memory and processes the data as directed by the stored instructions. The results can be stored back into memory or outputted via the I/O unit to an output device such as a printer, cathode-ray tube (CRT) display, digital-to-analog converter, LCD, etc.
In one instance, the CPU consisted of a single semiconductor chip known as a microprocessor. This microprocessor executed the programs stored in the main memory by fetching their instructions, examining them, and then executing them one after another. Due to rapid advances in semiconductor technology, faster, more powerful and flexible microprocessors were developed to meet the demands imposed by ever more sophisticated and complex software.
In some applications multiple processors are utilized. A singularly complex task can be broken into sub-tasks. Each sub-task is processed individually by a separate processor. For example, in a multiprocessor computer system, word processing can be performed as follows. One processor can be used to handle the background task of printing a document, while a different processor handles the foreground task of interfacing with a user typing on another document. Thereby, both tasks are handled in a fast, efficient manner. This use of multiple processors allows various tasks or functions to be handled by other than a single CPU so that the computing power of the overall system is enhanced. And depending on the complexity of a particular job, additional processors may be added. Furthermore, utilizing multiple processors has the added advantage that two or more processors may share the same data stored within the system.
These processors often contain a small amount of dedicated memory, known as a cache. Caches are used to increase the speed of operation. In a processor having a cache, as information is called from main memory and used, it is also stored, along with its address, in a small portion of especially fast memory, usually in static random access memory (SRAM). As each new read or write command is issued, the system looks to the fast SRAM (cache) to see if the information exists. A comparison of the desired address and the addresses in the cache memory is made. If an address in the cache memory matches the address sought, then there is a hit (i.e., the information is available in the cache). The information is then accessed in the cache so that access to main memory is not required. Thereby, the command is processed much more rapidly. If the information is not available in the cache, the new data is copied from the main memory and stored in the cache for future use.
Because these caches are typically localized, these multiple memory elements in a multiprocessor computer system can (and usually do) contain multiple copies of a given data item. It is important that any processor or other agent accessing any copy of this data receives a valid data value. In other words, cache coherency in hardware must be maintained. For example, given an initial account containing $100 and a financial computer program that instructs a first processor to add $50 to an account while a second processor is subsequently instructed to subtract $30 from that same account, the correct balance should be $100+$50-$30=$120. However, without any mechanism to synchronize or coordinate the instructions between these two processors, the first processor would output a value of $100+$50=$150, while the second processor would output a value of $100-$30=$70. Neither of these two results are correct.
One prior art method for handling this coherency problem involves locking the bus. When a processor encounters a critical section of the computer program, it "locks" the bus so that other processors are denied access until after those critical operations have been performed. Once those critical operations have executed, the bus is unlocked, and other processors are then granted access to the bus. Thus, in the above example, the first processor would first lock the bus and add $50 to the account. In the meantime, the second processor is prevented from carrying out its operation because of the locked condition. Once the first processor completes its execution, the bus is unlocked, and the second processor is assured of receiving the updated account balance of $150, from which the second processor subtracts $30, thereby yielding the correct balance of $120.
However, this prior art mechanism for locking the bus suffers from several drawbacks. Namely, it is very inefficient. Locking the bus reduces the throughput of that bus, which slows down the overall speed of the system. Furthermore, the caches are also essentially locked out. This results in a poor use of cache resources.
Thus, there is a need in the prior art for a lock handling mechanism that provides for a highly effective bus utilization and efficient use of cache resources without adding expensive hardware. It would be preferable if such a lock handling mechanism could readily be adapted to a pipelined bus structure and that it follows a cache coherence protocol.