1. Field of the Present Invention
The present invention generally relates to the field of microprocessor architectures and more particularly to multi-processor architectures employing a cached memory subsystem.
2. History of Related Art
The related concepts of cache memory subsystems and data locality are well known in the field of microprocessor based data processing systems. Cache memory refers to one or more small but fast storage arrays that are architecturally and heuristically closer to the processor core than the system's main memory (DRAM). Because of their limited size, cache memories have only the capacity to hold a portion of the information contained in the system's main memory. When a required piece of data is not present in cache memory, the system must access main memory for the data at a significant cost in terms of processing overhead. The benefit obtained by incorporating a cache memory is strongly correlated to the percentage of data access requests that the cache memory can satisfy (commonly referred to as the cache “hit” rate).
Fortunately, relatively small cache memories can frequently provide acceptably high hit rates because, as it turns out in many applications, the data that is most likely to be accessed in the near future is data that has been accessed relatively recently. Thus, by simply storing the most recently accessed data, a cache memory subsystem can provide the microprocessor core with fast access to data that is most likely to be required.
Ultimately, however, it is impossible to implement a cache memory with a 100% hit rate with a cache memory that is significantly smaller than the main memory. To achieve the highest hit rate possible and to fully utilize the limited cache memory that is available, designers are always interested in exploring the manner in which data is maintained in the cache. As an example, the instruction sets of some microprocessors include support for user level and/or supervisory level cache management instructions. Cache management instructions generally enable direct software control over some aspects of the cache memory subsystem.
The PowerPC® family of processors from IBM, for example, include support for several cache management instructions including the data cache block flush (dcbf) instruction. The dcbf instruction enables software to invalidate a specifiable block in the cache memory subsystem. The dcbf instruction is beneficial in circumstances, for example, when it is desirable to enforce coherency (all the way down to main memory) before permitting a subsystem that does not participate in the cache coherency protocol to access a particular block of data.
The dcbf instruction is also useful in circumstances when it can be determined with reasonably good probability that related memory locations are highly likely to be accessed one or a few times in close succession and then highly unlikely to be accessed thereafter, at least for a relatively long time. Data exhibiting this characteristic is said to have high spatial locality but low temporal locality. Spatial locality refers to a characteristic of data in which an access to data at memory address A, for example, is highly likely to be followed by one or more data accesses to the memory address that follows A sequentially. Temporal locality refers to a characteristic of data in which data that is accessed at time T is highly likely to be accessed again at time T+delta, where delta represents a relatively short interval of time. It is clear that data having high temporal and spatial locality is a good candidate for storage in a cache memory subsystem. Indeed, the reality of temporal locality is a fundamental reason for having cache memories.
In certain applications, however, data may exhibit high spatial locality and low temporal locality. This situation presents a dilemma to conventionally implemented cache subsystems. On the one hand it is desirable to prefetch and cache data that has high spatial locality to prevent a stall when the data is required. On the other hand, it is undesirable to leave this data in the cache when other data with a high temporal locality could be residing.
Microprocessors have long included support for instructions commonly referred to as “kill” instructions that invalidate a cache specified cache line thereby freeing up the cache to accept new data. In the context of multiprocessor systems operating with multiple non-synchronized cores, the kill instruction invalidates the specified cache block on all of the system's processors. Unfortunately, killing a cache block in this manner could result in the invalidation of a particular block line before the corresponding processor has completed all references to the block thereby resulting in a potentially undeterminable state.
Accordingly, it would be desirable to implement a processor that enabled operating systems, programmers, and/or processors to control, identify, and invalidate selected cache lines without incurring the potential timing and coherency issues that are raised in a multiprocessor environment.