1. Technical Field
The present invention relates in general to inclusivity in vertical cache hierarchies and in particular to selective inclusivity with respect to cached instructions. Still more particularly, the present invention relates to selective inclusivity to prevent cached instructions from being discarded due to deallocations in lower cache levels.
2. Description of the Related Art
Superscalar reduced instruction set (RISC) processors typically include bifurcated data and instruction caches in at least the level one (L1) layer of the storage hierarchy. Separate data and instructions caches are necessary due to the bandwidth required in contemporary superscalar processors, where instruction fetches and data references may easily exceed more than one cache access per processor cycle. L1 caches, which are typically imbedded within the processor hardware and designed for latencies of one processor cycle or less, are therefore usually bifurcated so that instruction and data references may be issued to separate caches during the same processor cycle.
Many data processing systems may contain multilevel cache hierarchies which are logically in linexe2x80x94that is, caches in higher levels are checked first, with a miss at a higher level prompting access to caches on lower levels. Multilevel caches are typically utilized to stage data to the processor with reduced access latency. Smaller, faster caches are employed in upper levels of the cache hierarchy while larger, slower caches at found in lower levels. Generally, such vertical cache configurations are thought of as inclusive. That is, the contents of each cache includes the contents of the cache immediately above it in the cache hierarchy.
When space is required within a cache for new data or instructions read from system memory, the cache selects a victim according to the particular replacement policy implemented for the cache and deallocates the selected cache location. In cases where a cache location contained in multiple caches is deallocated in one cache, inclusivity of logically in line caches is maintained by deallocating the same location in other caches. There are circumstances, however, where this produces an undesirable result. For example, if a cache location containing instructions is deallocated within a level three (L3) cache, the same space will generally be deallocated in a level two (L2) cache. If the processor/L1 cache thereafter attempts to reload instructions from the L2 cache, it may miss at the L2 and the L3 caches and (assuming no more levels in the cache hierarchy) be required to access the desired instructions from system memory. Instruction reloads may be necessary, for example, when a mispredicted branch is executed. Since the latency associated with a read from system memory is generally much longer than the latencies associated with the L2 and L3 caches, a significant performance delay may be incurred.
One problem with preventing instructions from being discarded when cache locations are deallocated is that there exists no clear mechanism for distinguishing instructions from data within a cache. Program source code within system memory may comprise an indistinguishable mixture of instructions and data. This may occur, for example, where a loader program resolves code linkages after loading the code into system memory. Thus, there exists no means for positively identifying instructions when a victim is selected so that a replacement policy may be designed to select an alternative victim. Moreover, it is not necessary that all cache levels be inclusive of the L1 instruction cache. It is simply desirable for an L2 cache to be inclusive of the L1 instruction cache""s present and recent contents in order to minimize latency of instruction reloads. Requiring similar inclusivity at all levels of the cache hierarchy detracts from overall cache efficiency.
It would be desirable, therefore, to provide a mechanism for maintaining selective inclusivity with regard to instructions in upper levels of the cache hierarchy. It would further be advantageous if the mechanism were not affected by deallocations in lower cache levels, such that instruction cache inclusivity is not required in all cache levels.
It is therefore one object of the present invention to provide an improved system of inclusivity in vertical cache hierarchies.
It is another object of the present invention to a method and apparatus of providing selective inclusivity with respect to instructions cachied in vertical cache hierarchies.
It is yet another object of the present invention to provide selective inclusivily to prevent cached instructions from being discarded due to deallocations in lower cache levels.
The foregoing objects are achieved as is now described. A modified MESI cache coherency protocol is implemented within a level two (L2) cache accessible to a processor having bifurcated level one (L1) data and instruction caches. The modified MESI protocol includes two substates of the shared state, which denote the same coherency information as the shared state plus additional information regarding the contents/coherency of the subject cache entry. One substate, SIC0, indicates that the cache entry is assumed to contain instructions since the contents were retrieved from system memory as a result of an instruction fetch operation. The second substate, SIC1, indicates the same information plus that a snooped flush operation hit the subject cache entry while its coherency was in the first shared substate. Deallocation of a cache entry in the first substate of the shared coherency state within lower level (e.g., L3) caches does not result in the contents of the same cache entry in an L2 cache being invalidated. Once the first substate is entered, the coherency state does not transition to the invalid state unless an operation designed to invalidate instructions is received. Operations from a local processor which contravene the presumption that the contents comprise instructions may cause the coherency state to transition to an ordinary shared state. Since the contents of a cache entry in the two coherency substates are presumed to be instructions, not data, instructions within an L2 cache are not discarded as a result of snooped flushes, but are retained for possible reloads by a local processor.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.