1. Field of the Invention
The present invention relates in general to caching for multiprocessor system design simulation and in particular to a unified processor cache model.
2. Description of Background
Caches have traditionally been designed to take advantage of the spatial and temporal locality of code sequences in commercial applications to reduce the memory access latency for load and store instructions by staging data predicted to be needed in the future into smaller memories having shorter latencies. As multiprocessing capabilities have increased in popularity, cache structures have been expanded and improved to support this functionality.
In a multiprocessor system, the same data may be shared and separately cached by different processors. To address the problem of multiple processors modifying the same data in local caches without notifying the other, various cache states have been defined and included into the cache organization to support different cache coherency protocols in snooping mechanisms. While many different cache coherency states have been defined for different multi-processor systems, the MESI protocol states remain very popular basic cache coherency states.
In a multiprocessor system having a multi-level cache hierarchy, the number of legal combinations for cache coherency states among the caches is extremely large. Even if a very thorough methodology were employed, it would not be easy to reach all of the legal combinations by running limited simulation cycles, as is conventional. Some legal combinations may only occur after execution of a complex sequence of many load, store and castout operations.
For instance, in order for data X within the level one (L1) and level two (L2) caches to be in the invalid state in both but in the modified state in the level three (L3) cache, the processor must first store data X to the appropriate address, causing the L1 to be in the modified state. Next, a number of loads or stores (depending on the L1's replacement algorithm) must be executed which map to the cache segment containing addresses including that of data X, forcing a castout of X from the L1 to the L2. Finally, a number of loads and stores that cause L1 misses and also force the L2 to select data X as the victim and castout the cache line containing the modified data from the L2 to the L3 must occur.
Currently in a multiprocessor simulation environment, each processor behavior has its own cache model. It is therefore possible to have multiple copies of the same data being cached in multiple models. As the system grows, more processor behaviors will be added to the simulation environment. This will take up more memory and make coherency checking of the whole system more difficult and inefficient. The use of a Unified Processor Cache Model reduces memory usage, simplifies coherency checking, and allows access to cache states and data more quickly and efficiently.