Many portable products, such as cell phones, laptop computers, personal data assistants (PDAs) or the like, utilize a processor executing programs, such as, communication and multimedia programs. The processing system for such products includes a processor and memory complex for storing instructions and data. Large capacity main memory commonly has slow access times as compared to the processor cycle time. As a consequence, the memory complex is conventionally organized in a hierarchy based on capacity and performance of cache memories, with the highest performance and lowest capacity cache located closest to the processor. For example, a level 1 instruction cache and a level 1 data cache would generally be directly attached to the processor. While a level 2 unified cache is connected to the level 1 (L1) instruction and data caches. Further, a system memory is connected to the level 2 (L2) unified cache. The level 1 instruction cache commonly operates at the processor speed and the level 2 unified cache operates slower than the level 1 cache, but has a faster access time than that of the system memory. Alternative memory organizations abound, for example, memory hierarchies having a level 3 cache in addition to an L1 and an L2 cache. Another memory organization may use only a level 1 cache and a system memory.
One of the principles behind why a memory hierarchy for instruction caches can be used is that instructions tend to be accessed from sequential locations in memory. By having caches hold the most recently used sections of code, processors may execute at a higher performance level. Since programs also contain branch, call, and return type instructions, and support other non sequential operations such as interrupts, the principle of sequential locality may be maintained only for relatively short sections of code. Due to such non-sequential operations, an instruction fetch to an instruction cache may miss, causing the instruction fetch to be applied to the next higher memory level that operates with a higher memory capacity and slower access time. A miss may cause the processor to stall awaiting the instruction. In order to keep processor performance high, cache miss rates should be low.
An instruction cache is generally constructed with a plurality of instructions located at a single address in the instruction cache. This plurality of instructions is generally called a cache line or simply a line. A miss may occur on an instruction access anywhere in a cache line. When a miss occurs, rather than just fetching the needed instruction, the rest of the cache line, from the missed instruction to the end of the cache line, may also be fetched. In some systems, this technique of prefetching is further extended to always prefetch the rest of the cache line and the next cache line on a miss. This conventional technique of always prefetching the next cache line is based on an assumption that the next cache line contains instructions that will shortly be needed. This presumption of use of instructions in the next cache line remains valid even if, for example, a conditional branch is encountered in the line and the condition causes the branch to fall through to the next sequential instruction. By always prefetching the next cache line, misses may be reduced.
The locality principle of sequential access of instructions of course fails at some point in a program and misses do occur due to non-sequential operations caused by branches, calls and returns, or the like. A miss due to a sequential access may also occur, for example, when an instruction is fetched at the end of a cache line, and the next sequential instruction, which should reside in the next sequential instruction cache line, is not resident in the cache. A miss due to a non-sequential access may occur, for example, when a branch instruction is encountered and the branch causes the program address to change to a new location and the instruction at the new location is not resident in the cache. The conventional technique of always prefetching the next cache line fetches instructions that may not be used and consequently causes unnecessary loss of memory access bandwidth, increased power use, and lower processor performance.