Modern computer systems utilize various technologies and architectural features to achieve high performance operation. These technologies and architectural features include reduced instruction set computers, high speed cache memories and multiprocessor systems. Innovative arrangements of high performance components embodying one or more of the above can often result in significant improvements in the capabilities and processing power of a computer system.
A reduced instruction set computer (RISC technology) represents a "back to basics" approach to semiconductor chip design. An instruction set comprises a set of basic commands for fundamental computer operations, such as the addition of two data values to obtain a result. The instructions of an instruction set are typically embedded or hard wired into the circuitry of the chip embodying the central processing unit of the computer, and the various statements and commands of an application program running on the computer are each decoded into a relevant instruction or set of instructions of the instruction set for execution.
LOAD, ADD and STORE are examples of basic instructions that can be included in a computer's instruction set. Such instructions may be used to control, for example, the movement of data from memory to general purpose registers, addition of the data in the registers by the arithmetic and logic unit of the central processing unit, and return of the result to the memory for storing. In recent years, with significant advances in the miniaturization of silicon chips, chip designers began to etch more and more circuits into the chip circuitry so that instruction sets grew to include hundreds of instructions capable of executing, via hard wired circuitry, sophisticated and complex mathematical and logical operations.
A problem with the proliferation of instructions included in an instruction set is that the increasing complexity of the circuitry required to implement a large number of instructions resulted in a slow down in the processing speed of the computer. Moreover, it was determined that a relatively small percentage of the instructions of the instruction set were performing a large percentage of the processing tasks of the computer. Thus, many of the instructions have become "expensive" options, whose relatively infrequent use does not make up for the slow down caused by large instruction sets.
The objective of a RISC design is to identify the most frequently used instructions of the instruction set and delete the remaining instructions from the set. A chip can then be implemented with a reduced, but optimal number of instructions to simplify the circuitry of the chip for increased speed of execution for each instruction. While a complex operation previously performed by a single instruction may now have to be executed via several more basic instructions, each of those basic instructions can be executed at a higher speed than was possible before reduction of the instruction set. More significantly, when the instructions retained in the instruction set are carefully selected from among those instructions performing the bulk of the processing within the computer, the RISC system will achieve a significant increase in its overall speed of operation since that entire bulk of processing will be performed at increased speed.
By way of example, in some "large" instruction set systems, twenty percent of the instructions were performing eighty percent of the processing work. Thus a RISC system comprising the twenty percent of the instructions would achieve significantly higher speeds of operation during the performance of eighty percent of the workload.
The high performance capabilities achieved in a RISC computer are further enhanced when a plurality of such RISC computers is arranged in a multiprocessor system utilizing cache memories. A multiprocessor system can comprise, e.g., a plurality of RISC computers, an I/O device and a main memory module or modules, all coupled to one another by a high performance backplane bus. The RISC computers can be utilized to perform co-operative or parallel processing as well as multi-tasking among them for execution of several applications running simultaneously, to thereby achieve dramatically improved processing power. The capabilities of the system can be further enhanced by providing a cache memory at each one of the RISC computers in the system.
A cache memory comprises a relatively small, yet relatively fast memory device arranged in close physical proximity to a processor. The utilization of cache memories is based upon the principle of locality. It has been found, for example, that when a processor accesses a location in memory, there is a high probability that the processor will continue to access memory locations surrounding the accessed location for at least a certain period of time. Thus, a preselected data block of a large, relatively slow access time memory, such as a main memory module coupled to the processor via a bus, is fetched from the main memory and stored in the relatively fast access cache memory. Accordingly, as long as the processor continues to access data from the cache memory, the overall speed of operation of the processor is maintained at a level significantly higher than would be possible if the processor had to arbitrate for control of the bus and then perform a memory read or write operation, with the main memory module, for each data access.
While the above described cached, multi-processor RISC computer system represents a state-of-the-art model for a high performance computer system, the art has yet to achieve an optimal level of performance efficiency.
One problem associated with multiprocessor systems having a cache memory at each processor of the system, is cache coherency. In a multiprocessor system, it is necessary that the system store a single, correct copy of data being processed by the various processors of the system.
Thus, when a processor writes to a particular data item stored in its cache, that copy of the data item becomes the latest correct value for the data item. The corresponding data item stored in main memory, as well as copies of the data item stored in other caches of the system, becomes outdated or invalid.
In a write back cache scheme, the data item in main memory is not updated until the processor requires the corresponding cache location to store another data item. Accordingly, the cached data item that has been modified by the processor write remains the latest copy of the data item until the main memory is updated. It is, therefore, necessary to implement a scheme to monitor read and write transactions to make certain that the latest copy of a particular data item is properly identified whenever it is required for use by a processor.
One known method to provide the necessary coherency between the various cache memories and the main memory of the computer system, is to implement a SNOOPING bus protocol wherein a bus interface of each processor or other component in the multiprocessor computer system, monitors the system backplane bus for bus activity involving addresses of data items that are currently stored in the processor's cache. Status bits are maintained in a TAG store associated with each cache to indicate the status of each data item currently stored in the cache. The three possible status bits associated with a particular data item stored in a cache memory can be, e.g., the following:
SHARED--If more than one cache in the system contains a copy of the data item. A cache element will transition into this state if a different processor caches the same data item. That is, if when SNOOPING on the system bus, a first interface determines that another cache on the bus is allocating a location for a data item that is already stored in the cache associated with the first interface, the first interface notifies the other interface by asserting a SHARED signal on the system bus, signaling the second interface to allocate the location in the shared state. When this occurs the first interface will also update the state of it's copy of the data item to indicate that it is now in the shared state. PA1 DIRTY--A cache entry is dirty if the data item held in that entry has been updated more recently than main memory. Thus, when a processor writes to a location in its cache, it sets the DIRTY bit to indicate that it is now the latest copy of the data item. A broadcast of each write is initiated whenever the SHARED bit is asserted. PA1 VALID--If the cache entry has a copy of a valid data item in it. In other words, the stored data item is coherent with the latest version of the data item, as may have been written by one of the processors of the computer system.
Frequently, the processor will be required to perform a read operation where the line to be read is not contained within its cache memory. Such a read operation requires that the requested line be read from main memory and then written into the processor's cache. This operation may result in a VALID and DIRTY cache entry in the cache being overwritten. The VALID and DIRTY line of data in the entry that is to be overwritten must first be written to main memory before the new line of data can be read from main memory and written into the processor's cache. This operation is referred to as an exchange transaction.
The data line returned from main memory is referenced by what is termed an address field. The data line stored in the main memory is referenced by what is termed an exchange address field. Usually, the address field and the exchange address field are sent over the system bus as part of the exchange command.
To speed an exchange transaction and to reduce the amount of bus bandwidth required, the system bus protocol in use often does not require the full address of both the line which is to be stored in main memory and the line which is to be read from main memory to be transmitted over the system bus.
The address field and the exchange address field can be designed so that their addresses have a common index, as, for example, by using a directly mapped cache. That is, each address contains two parts, a cache index comprising a number of lower order bits necessary to index the cache and a TAG field comprising the remaining high order bits. The index part is common to both the address field and the exchange address field since the data line to be overwritten and the data line read from main memory are related to the same cache entry. Thus, the system only requires one index to be sent over the system bus. Thus, only part of the exchange address field (i.e. the TAG field of the exchange address) need be transmitted. There is no need to transmit the index. As the index is common to both, the index part for the exchange address field can be obtained from the index transmitted for the address field. Thus, the exchange address may be less than the number of bits contained in the address field.
To obtain the full exchange address, the index bits of the address field are concatenated with the bits of the exchange address field.
However, various processor modules of a multiprocessor system may have different size caches and therefore different size cache indexes. The concatenation of the index bits in the address field and the exchange address field bits to obtain the full exchange address may not provide the correct full exchange address where different size cache indexes are used.