A conventional program cache is a memory bank that bridges main memory and a central processing unit (CPU). The program cache is faster than the main memory and so allows instructions to be executed by the CPU at higher speeds. Instructions are transferred from the main memory to the program cache in blocks (i.e., cache lines). The instructions are usually transferred to the program cache ahead of time using a look-ahead technique. The more sequential the instructions are in the routine being executed, the greater the chance that a next instruction will already be in the program cache, resulting in better performance.
Referring to FIG. 1, a block diagram of a conventional processor system 10 is shown. The system 10 has a CPU 12 coupled to a program cache 14. The program cache 14 communicates with a main memory 16 via a system bus 18. The main memory 16 stores instructions and data used by the CPU 12. The program cache 14 is commonly referred to as a level 1 (L1) cache.
A cache hit refers to a successful attempt by the CPU 12 to read an instruction from the cache 14. When a cache hit occurs, the requested instruction is transferred from the program cache 14 to the CPU 12. A resulting transfer latency is designed to be short in order to avoid stalling the CPU 12.
A cache miss refers to a failed attempt by the CPU 12 to read an instruction from the cache 14. When a cache miss occurs, an appropriate instruction line is transferred from the main memory 16 to the program cache 14. The “missed” instruction is then read by the CPU 12 from the program cache 14. An access latency caused by transferring the instruction line from the main memory 16 across the system bus 18 to the program cache 14 is longer than the latency in transferring the instruction from the program cache 14 to the CPU 12. In most cases, the CPU 12 will stall while waiting for the requested instruction to become available in the program cache 14.
Fetch-ahead is a feature for improving a performance of the program cache 14. While the CPU 12 is executing a sequence of instructions in a software code, if the next instruction line is not already in the program cache 14, the next instruction line is fetched from the main memory 16 after a program read access. The fetch-ahead feature usually improves cache hit performance due to linearity of the program-flow in the code. In many cases, the next instruction requested by the CPU 12 is sequential to the instruction currently being executed. If the next instruction is not inside the program cache 14, a program-cache controller operating in the background moves the next instruction line into the program cache 14 without stalling the CPU 12.
The fetch-ahead feature improves performance when the software code is linear. However, when a change-of-program-flow exists in the code, the fetch-ahead feature can reduce the cache performance thereby causing a long stall in the CPU 12. In the case of a change-of-program-flow, a new instruction line fetched by the fetch-ahead feature is not used by the CPU 12. Instead, the change-in-program-flow forces a different instruction line to be read from the main memory 16. Furthermore, the different instruction line might overwrite an existing instruction line that is subsequently requested by the CPU 12. As a result, the fetch-ahead feature can “dirty” the program cache 14 with unwanted instruction lines thus reducing the overall program cache performance.
A conventional level 2 (L2) cache situated between the program (L1) cache 14 and the main memory 16 can be used to reduce the access latency due to a cache miss in the L1 cache. An L2 cache is commonly slower than an L1 cache but faster than the main memory. Therefore, transferring a missing instruction line from the L2 cache to the L1 cache takes less time than fetching the missing instruction line from the main memory 16. However, if the L1/L2 cache is an exclusive arrangement (each line exists in only one of the caches), an unused line fetched-ahead will propagate through the L2 cache to the L1 cache making the L1 cache dirty. If the L1/L2 cache is an inclusive arrangement (each line exists in both of the caches), the unused line will pollute both the L1 cache and the L2 cache.