1. Technical Field
The present invention relates in general to cache preloading for multiprocessor system design simulation and in particular to comprehensive cache preloading ensuring that all possible legal cache coherency state combinations are reached. Still more particularly, the present invention relates to randomly preloading all possible legal cache coherency state combinations during simulation to verify proper design operation even in corner case state combinations.
2. Description of the Related Art:
Caches have traditionally been designed to take advantage of the spatial and temporal locality of code sequences in commercial applications to reduce the memory access latency for load and store instructions by staging data predicted to be needed in the future into smaller memories having shorter latencies. As multiprocessing capabilities have increased in popularity, cache structures have been expanded and improved to support this functionality.
In a multiprocessor system, the same data may be shared and separately cached by different processors. To address the problem of multiple processors modifying the same data in local caches without notifying the other, various cache states have been defined and included into the cache organization to support different cache coherency protocols in snooping mechanisms. While many different cache coherency states have been defined for different multi-processor systems, the MESI protocol states remain very popular basic cache coherency states.
The modified (M) coherency state indicates that only one cache has the valid copy of the data, and that copy is "dirty" or modified with respect to the copy in system memory. The exclusive (E) coherency state is defined to signify that only one cache has a valid copy of the data, which is unmodified with respect to the data in system memory. The shared (S) coherency state denotes that one or more caches have copies of the data and that no copy is modified with respect to system memory. The invalid (I) coherency state indicates that no caches have a valid copy of the data.
In multiprocessor systems employing the MESI protocol or a variant, a processor preparing to store data will first examine the cache coherency state within the local cache corresponding to the store location. If the subject cache line is either modified or exclusive, the store will be performed immediately. Otherwise, the processor seeking to store the data must invalidate all other copies of the data in the memory hierarchy before the store may be safely executed. These protocols are followed by all processors in a multiprocessor system to ensure that data coherency with respect to instruction execution sequences is maintained.
The protocols described, however, can become extremely complicated if multiple levels of caches--including internal or external in-line caches--are implemented for each processor, particularly when 3 or more cache levels are implemented. For single-level cache systems, only horizontal cache coherency across processors need be considered. However, where multi-level cache hierarchies are implemented for each processor, both vertical and horizontal cache coherency must be maintained.
In a multiprocessor system having a multi-level cache hierarchy, the number of legal combinations for cache coherency states among the caches is extremely large. Even if a very thorough methodology were employed, it would not be easy to reach all of the legal combinations by running limited simulation cycles, as is conventional. Some legal combinations may only occur after execution of a complex sequences of many load, store and castout operations.
For instance, in order for data X within the level one (L1) and level two (L2) caches to be in the invalid state in both but in the modified state in the level three (L3) cache, the processor must first store data X to the appropriate address, causing the L1 to be in the modified state. Next, a number of loads or stores (depending on the L1's replacement algorithm) must be executed which map to the cache segment containing addresses including that of data X, forcing a castout of X from the L1 to the L2. Finally, a number of loads and stores which cause L1 misses and also force the L2 to select data X as the victim and castout the cache line containing the modified data from the L2 to the L3 must occur.
The sequence of operations required to reach a particular combination of cache coherency states may entail an extremely large number of operations. Furthermore, the example described above assumes that cache levels are not shared between processors. Many more additional legal combinations of cache coherency states become possible where the several processors and L1 caches share a common L2 and/or L3 cache, further complicating the string of operations necessary to achieve a particular legal combination of coherency states.
Finally, a physical multiprocessor system running any kind of application at several hundred million or more instructions per second will result in the caches filling up within a very short period of time and the system reaching any combination of cache states eventually. In contrast, simulations run tests at a much lower frequency, often less than 100 processor cycles per second. With this limitation, it is not practical to run long test cases--more than a thousand instructions, for example. Therefore, it is nearly impossible to reach all types of cache state combination by running limited simulations without cache preloading. An insufficient or limited preloading mechanism could leave hidden bugs in the design which would not be found until the real silicon comes back.
It would be desirable, therefore, to provide a comprehensive and complete cache preloading mechanism for simulations. It would further be advantageous for the mechanism to employ only possible cache coherency state combinations, excluding those which are legal in theory but could never occur. It would also be desirable for the cache preload mechanism to randomly preload combinations for more complete verification, including corner case combinations.