1. Technical Field
The present invention relates in general to coherency of bifurcated data and instruction caches and in particular to instructions to maintain coherency of bifurcated data and instruction caches. Still more particularly, the present invention relates to instructions and system bus transactions for maintaining coherency between data and instruction caches in a multiprocessor system having multiple copies of a subject cache entry.
2. Description of the Related Art
Superscalar reduced instruction set (RISC) processors typically include bifurcated data and instruction caches in at least the level one (L1) layer of the storage hierarchy. Separate data and instructions caches are necessary due to the bandwidth required in contemporary superscalar processors, where instruction fetches and data references may easily exceed more than one cache access per processor cycle. L1 caches, which are typically imbedded within the processor hardware and designed for latencies of one processor cycle or less, are therefore usually bifurcated so that instruction and data references may be issued to separate caches during the same processor cycle.
The bifurcation of data and instruction caches adds an additional aspect to the problem of maintaining a coherent memory hierarchy, that is, to provide a single view of the contents of memory to all of the processors. For example, periodically a processor will modify data in a data cache which is actually instructions to be executed later. This may occur, for example, when a loader program resolves code linkages after loading the code into memory. As another example, a processor which copies pages of memory contents does not distinguish between instructions and data and may copy the same page contents to both instruction and data caches. Both instruction cache block invalidate and clean operations are subsequently required to free a cache location containing a portion of the copied page.
Most currently available superscalar processors do not include any mechanism for maintaining coherency of bifurcated level one caches; that is, changes in one L1 cache are not automatically reflected in other L1 caches, whether in a different processor or in the same processor. In most superscalar processors, maintaining coherency between bifurcated data and instruction caches is left to software. Software typically handles the problem of maintaining coherency between bifurcated caches by flushing modified data cache entries which originally contained instructions and invalidating the same cache entries if resident in the instruction cache. These actions are taken for all altered lines within the program source code, one page at a time.
All superscalar processors support an instruction for writing modified data from a level one cache to system memory. Such instructions may be used by programmers to make a modified cache line immediately visible outside the processor. This is useful in graphics applications, for writing display information to a memory mapped graphics adapter or a display buffer. By far the most prevalent use of such instructions, however, is for software management of bifurcated data/instruction cache coherency. When used for such purposes, the instruction writing modified data to memory may be followed by an instruction invalidating the same cache location in instruction caches. In the PowerPC.TM. family of devices, for example, the instruction which writes modified data to system memory is the data cache block store (dcbst) instruction, while the instruction invalidating the cache location in instruction caches is the instruction cache block invalidate (icbi) instruction.
When the dcbst instruction is executed, the effective address is computed, translated, and checked for protection violations. If the cache location referenced by the address does not contain modified data, the cache block is left unchanged (the instruction is treated as a no-op) and a clean operation is initiated on the system bus. If the cache block contains modified (dirty) data, however, the data is pushed out of the data cache onto the system bus. All bytes in the cache block are written to system memory and the coherency state of the cache block is set to exclusive (E), indicating that the cache block contains valid data consistent with the corresponding location in system memory but found only, within all caches at that level of the storage hierarchy, in the subject cache. A write operation is then initiated on the system bus.
When the icbi instruction is executed, the effective address is again computed, translated, and checked for protection violations. If the addressed cache block is in the instruction cache, the instruction cache block is marked invalid, indicating that cache entry--both the address tag and the contents--is not valid and not coherent with either system memory or any other cache at the same level of the storage hierarchy. Both the content and the status of the cache block remains unchanged within the data cache of all processors. The icbi or ikill operation is initiated unconditionally on the system bus to invalidate the appropriate cache line of all other instruction caches throughout the storage hierarchy.
The instruction pair described above does not occur regularly during execution of typical program source code. When utilized to flush modified data cache entries originally containing instructions and invalidate the cache block within instruction caches, however, entire pages of memory are flushed one cache block at a time. Thus a large group of dcbst/icbi instruction pairs will be executed within a relatively short period.
Another need which periodically arises in multiprocessor systems is for a cache entry to be written to an alternative cache location previously containing instructions, the newly modified cache entry to be made coherent with system memory, and the new cache location to be invalidated in all instruction caches. That is, it is desirable to write the contents of cache entry x to cache entry y, update system memory corresponding to cache entry y, and invalidate cache entry y in all instruction cache in the system. In current systems, it is necessary to execute a number of instructions to achieve this result: the contents of cache entry x are written to cache entry y inside the cache, then a dcbst instruction is executed on cache entry y, then an icbi instruction is executed on cache entry y, and finally a synchronization (sync) instruction is executed to ensure that all instructions are completed.
It would be desirable, therefore, to provide an improved mechanism for flushing modified data in cache blocks originally containing instructions to system memory and to invalidate the cache blocks in instruction caches. It would further be advantageous for the mechanism to be implemented in a single instruction and system bus operation.