1. Field of the Invention
The present invention generally relates to caching in a processor and more particularly to a complier assisted victim cache bypassing system.
2. Description of Related Art
Before the invention of caches, several machines implemented forms of dynamic scheduling in order to avoid stalling when a memory access was encountered. The two most notable examples were the CDC 6600 with its scoreboard and the IBM 360/91 with its Tomasulo Algorithm, which were introduced in the late 1960's. Dynamic scheduling, which entails rearranging the order of instructions in hardware during execution of the program while maintaining the semantics of the original program order, was found to be extremely complex, expensive, hard to debug, and hard to test. Therefore, during the 1970's and 1980's, no other dynamically scheduled machines were produced at IBM. Similarly, dynamic scheduling was also abandoned at CDC. Furthermore, other manufacturers did not produce dynamically scheduled processors during that period.
Shortly after the introduction of the CDC 6600 and the IBM 360/91, computer systems using cache memory were developed. In those systems, as in modem computers, most memory accesses by a processor are satisfied by data in cache memory. Since the cache can be accessed much more quickly than main memory, the need for dynamic scheduling was also reduced.
In order to reduce processor stalling when a cache miss is encountered, some microprocessor manufacturers have reintroduced dynamical scheduling in their processors in recent years. A dynamically scheduled processor will try to find other instructions that do not depend on the data being fetched from the missing load, and execute these other instructions out-of-order and in parallel with the cache miss. Significantly higher performance can thus be obtained.
In recent years, processor cycle times have decreased greatly, and the capacity of memory chips has increased significantly. While cache access time has decreased along with processor cycles, the access time of memory chips has changed in a much slower pace. This has led to an increasing gap between cache access times and main memory access times.
For example, in the late 1970's, a VAX 11-780 would only slow down 50% if its cache was turned off and if it executed out of main memory. Today, main memory access times can be more than 100 cycles, and programs could slow down by more than 100 times if they fetched each instruction and data reference from main memory instead of cache. Even when an instruction or data reference is occasionally accessed from main memory, the small amount of cache misses can still greatly slow down program execution because of the long memory access times.
Several modern multiprocessor systems employ victim caches to improve the performance of the main caches. A victim cache stores cache lines evicted out of its main cache. If the main cache needs the same line later, it can quickly load the line from the victim cache, which is much faster than loading it from main memory. However, it is not always the case that a line in the victim cache will loaded back to the main cache. Moving such a line into the victim cache unnecessarily increases contention on system resources and unnecessarily increases power consumption. Allowing the line to bypass the victim cache and to be directly written back memory would provide better performance.