Modern computer systems utilize various technologies and architectural features to achieve high performance operation. These technologies and architectural features include reduced instruction set computers, high speed cache memories and multiprocessor systems. Innovative arrangements of high performance components embodying one or more of the above can often result in significant improvements in the capabilities and processing power of a computer system.
A reduced instruction set computer (RISC technology) represents a "back to basics" approach to semiconductor chip design. An instruction set comprises a set of basic commands for fundamental computer operations, such as the addition of two data values to obtain a result. The instructions of an instruction set are typically embedded or hard wired into the circuitry of the chip embodying the central processing unit of the computer, and the various statements and commands of an application program running on the computer are each decoded into a relevant instruction or set of instructions of the instruction set for execution.
LOAD, ADD and STORE are examples of basic instructions that can be included in a computer's instruction set. Such instructions may be used to control, for example, the movement of data from memory to general purpose registers, addition of the data in the registers by the arithmetic and logic unit of the central processing unit, and return of the result to the memory for storing. In recent years, with significant advances in the miniaturization of silicon chips, chip designers began to etch more and more circuits into the chip circuitry so that instruction sets grew to include hundreds of instructions capable of executing, via hard wired circuitry, sophisticated and complex mathematical and logical operations.
A problem with the proliferation of instructions included in an instruction set is that the increasing complexity of the circuitry required to implement a large number of instructions resulted in a slow down in the processing speed of the computer. Moreover, it was determined that a relatively small percentage of the instructions of the instruction set were performing a large percentage of the processing tasks of the computer. Thus, many of the instructions have become "expensive" options, whose relatively infrequent use does not make up for the slow down caused by large instruction sets.
The objective of a RISC design is to identify the most frequently used instructions of the instruction set and delete the remaining instructions from the set. A chip can then be implemented with a reduced, but optimal number of instructions to simplify the circuitry of the chip for increased speed of execution for each instruction. While a complex operation previously performed by a single instruction may now have to be executed via several more basic instructions, each of those basic instructions can be executed at a higher speed than was possible before reduction of the instruction set. More significantly, when the instructions retained in the instruction set are carefully selected from among those instructions performing the bulk of the processing within the computer, the RISC system will achieve a significant increase in its overall speed of operation since that entire bulk of processing will be performed at increased speed.
By way of example, in some "large" instruction set systems, twenty percent of the instructions were performing eighty percent of the processing work. Thus a RISC system comprising the twenty percent of the instructions would achieve significantly higher speeds of operation during the performance of eighty percent of the workload.
The high performance capabilities achieved in a RISC computer are further enhanced when a plurality of such RISC computers is arranged in a multiprocessor system utilizing cache memories. A multiprocessor system can comprise, e.g., a plurality of RISC computers, an I/O device and a main memory module or modules, all coupled to one another by a high performance backplane bus. The RISC computers can be utilized to perform co-operative or parallel processing as well as multi-tasking among them for execution of several applications running simultaneously, to thereby achieve dramatically improved processing power. The capabilities of the system can be further enhanced by providing a cache memory at each one of the RISC computers in the system.
A cache memory comprises a relatively small, yet relatively fast memory device arranged in close physical proximity to a processor. The utilization of cache memories is based upon the principle of locality. It has been found, for example, that when a processor accesses a location in memory, there is a high probability that the processor will continue to access memory locations surrounding the accessed location for at least a certain period of time. Thus, a preselected data block of a large, relatively slow access time memory, such as a main memory module coupled to the processor via a bus, is fetched from the main memory and stored in the relatively fast access cache memory. Accordingly, as long as the processor continues to access data from the cache memory, the overall speed of operation of the processor is maintained at a level significantly higher than would be possible if the processor had to arbitrate for control of the bus and then perform a memory read or write operation, with the main memory module, for each data access.
While the above described cached, multi-processor RISC computer system represents a state-of-the-art model for a high performance computer system, the art has yet to achieve an optimal level of performance efficiency.
In multiprocessor computer systems, with local cache memories for each processor, several system bus arbitration and timing problems may arise with respect to accessing the cache memories.
For instance, in multiprocessor systems where each processor's local cache is accessible to other modules on the system backplane bus, it is possible to starve a processor, i.e. deny the processor sufficient access to its cache, since access to the cache by other modules may have the effect of excluding the processor from accessing its own cache. Starvation may also result by excluding a processor from access to its cache as a result of cache probes or other cache accesses which are required during each bus transaction under a SNOOPING bus protocol.
In accordance with a SNOOPING bus protocol, any system bus transaction will cause a cache to be probed in order to determine whether the cache contains data relevant to an ongoing system bus transaction. While the cache is being accessed as a result of a bus transaction, the processor is denied access to the cache and may become stalled. This stall condition may continue until the system bus releases control of the cache and the processor is once again permitted access to the cache.
When a series of back-to-back bus transactions occurs, there is a possibility that the processor will effectively become starved as a result of being denied access to its own cache. Such a condition is particularly likely to occur as a result of back-to-back bus write transactions since these deprive the processor access to the cache for the greatest amount of time due to the nature of the operation required to update or invalidate the cache entries in response to a system bus write. Even under conditions in which a processor is not entirely cache-starved, when the system bus is heavily loaded, processor access to the cache may be sufficiently impeded to cause an unacceptable degradation in system performance.
A known approach to solving the problem of processor starvation is the use of dual-ported caches. This allows simultaneous access to the cache by both the system bus and the processor. While this approach avoids the problem of stalling the processor during system bus accesses, apart from being costly, it may introduce coherency problems with regard to shared data entered in the cache, since access to the cache is not exclusive to one module at a time.
Another problem encountered in multiprocessor, SNOOPING protocol systems with shared cache memories, is the potential delay associated with a relinquishing of "ownership" of a cache by the corresponding processor. If a processor is engaged in accessing its cache when a bus transaction begins, the processor must give up control of the cache so that it may be probed to determine whether it contains data relevant to the system bus transaction. However, the bus transaction must be stalled to allow the processor to first complete the cache access in progress and relinquish control of its cache. The resultant delay adds to the latency of system bus transactions.
A known approach to the problem of coordinating access to a processor's cache has been the use of elongated bus transactions. Under this approach, bus transactions are all elongated to a sufficient length to guarantee that a processor cache access operation in progress when a system bus transaction is initiated, will be completed before the system bus needs to probe the cache. While this approach provides a simple solution to the problem of coordinating access to the caches of the system, it causes system performance to suffer since simple bus transactions that do not require a long transaction duration, are extended to the duration of the elongated bus transaction. This wastes bus bandwidth and increases latency.
The handling of command errors in multiprocessor computer systems implementing a SNOOPING bus protocol introduces additional bus transaction timing problems. Command errors are those errors that appear in the command field of a system bus transaction. The contents of the command field indicate the type of transaction that is being carried out on the system bus.
In a computer system, different types of bus transactions may be of various durations, e.g. a read transaction may require 5 bus clock cycles to execute while a write transaction may require 7 bus clock cycles. An error in interpreting which type of transaction is being carried out on the system bus could cause a module requesting access to the bus to erroneously expect the current bus transaction to end later than it actually does. Thus, when the bus transaction terminates, and a subsequent requesting module is granted access to the system bus, the module may be unprepared to perform a bus transaction since it did not expect the system bus to become available until several cycles later. This may cause the module asserting the bus access request to miss the grant of the bus or result in a next requesting module being unprepared for the new bus transaction.
One known approach to resolving the timing problem associated with command errors has been to ignore them. This approach relies on the infrequency with which such errors occur. Such errors, however, may have catastrophic consequences. Thus, any approach which ignores command errors entails serious risks to system performance.
Another known approach to resolving the timing problem associated with command errors has been to make all system bus transaction types the same duration. While this eliminates the possibility of erroneously expecting a transaction to last longer than it actually does, it causes system performance to suffer since simple bus transactions that execute quickly would have to be extended to the length of the longest transaction so that all transactions would be of the same duration. As mentioned earlier, the elongation of bus transactions wastes bus bandwidth and increases latency reducing overall system performance.