1. Technical Field
The present invention relates in general to cache coherency in data processing systems and in particular to efficiently maintaining cache coherency in data processing systems having multiple caches of the same and/or different levels. Still more particularly, the present invention relates to an enhanced cache store protocol for forwarding store updates only as far as necessary to assure cache consistency.
2. Description of the Related Art
Current reduced instruction set computing (RISC) processors typically contain both instruction and data level one caches. Utilizing separate level one caches is necessary in view of the extreme bandwidth required in contemporary processors. Instruction fetches and data references in superscalar processors may easily exceed more than one cache access per cycle. Therefore, instruction and data references are issued to separate caches, one for data and one for instructions. Each level one cache is designed for access every cycle.
One problem with separate level one caches for data and instructions is that periodically a processor will modify data in the data cache which is determined to actually comprise instructions to be executed later. This may occur, for example, when a loader program resolves code linkages after loading the code into memory. The processor may have already fetched this data into the instruction cache, before the data was modified. However, most RISC processors do not include any mechanism for maintaining level one cache coherency; that is, changes in one are not automatically reflected in the other. Therefore, software executing in the processor is required to provide a mechanism for handling such situations. The problem is usually addressed by flushing any instructions in the data cache which have been modified and invalidating the same addresses which may be in the instruction cache. This is done for any lines which may have been changed (typically all of the lines within a program) one page at a time.
Most superscalar processors having separate level one caches support an instruction for forcing all modified data within a given cache block out of the level one data cache. In PowerPC.TM. processors, this instruction is a data cache block store (dcbst) instruction. When a dcbst instruction is executed, the effective address is computed, translated, and checked for protection violations. If the target cache block for this instruction does not contain modified data, the cache block is left unchanged and a clean operation is broadcast onto the bus. If modified (dirty) data is associated with the cache block, however, the processor pushes the modified data out of the data cache. All bytes in the cache block are written to main memory and the cache block is flagged "exclusive," indicating that the cache contains data at that address which is valid data shared with system memory, but other caches may contain incongruent data at the same address.
Other RISC processors support equivalent instructions for similar operations. Such instructions, however, are also used by programmers who wish to make the results of a modified line in the cache immediately visible outside the processor. This may be useful, for example, in making a graphics update to a memory mapped graphics adapter. Thus, the instruction provided to solve cache coherency problems by forcing data all the way to memory may also be used to force data to an I/O device. These disparate uses, however, are to some degree inconsistent in their objectives. Use of the dcbst or equivalent instruction for cache coherency need not write the data all the way to the lowest level in the cache hierarchy; merely writing the data to a point from which data fetched to the instruction cache derives would be sufficient.
It would be advantageous, therefore, to provide a mechanism for distinguishing dcbst or equivalent instructions intended to provide cache coherency from similar instructions intended to disseminate results of a modified cache line. It would further be desirable if mechanism did not add significantly to the operational complexity or resource requirements of the processor.