1. Field of the Invention
The present invention generally relates to prefetching of processor instructions, and particularly relates to non-sequential instruction prefetching.
2. Relevant Background
Microprocessors perform computational tasks in a wide variety of applications, including portable electronic devices. In many cases, maximizing processor performance is a major design goal, to permit additional functions and features to be implemented in portable electronic devices and other applications. Additionally, power consumption is of particular concern in portable electronic devices, which have limited battery capacity. Hence, processor designs that increase performance and reduce power consumption are desirable.
Most modern processors employ one or more instruction execution pipelines, wherein the execution of many multi-step sequential instructions is overlapped to improve overall processor performance. Capitalizing on the spatial and temporal locality properties of most programs, recently executed instructions are stored in a cache—a high-speed, usually on-chip memory—for ready access by the execution pipeline.
Many processors employ two levels of high-speed caches. In such processors, the first level conventionally comprises a data cache for storing data and an instruction cache for storing instructions. The data and instruction caches may be separate or unified. A second level (L2) cache provides a high-speed memory buffer between the first-level caches and memory external to a microprocessor, e.g., Dynamic Random Access Memory (DRAM), flash memory, hard disk drives, optical drives, and the like.
A common style of cache memory comprises a Content Addressable Memory (CAM) coupled to a Random Access Memory (RAM). The cache is accessed by comparing a memory address against full or partial, previously accessed, memory addresses stored in the CAM. If the address matches a CAM address, the cache indicates a “hit,” and may additionally provide a “line” of data (which, in the case of an instruction cache, may comprise one or more instructions) from a location in the RAM that corresponds to the matching CAM address. If the compare address does not match any memory address stored in the CAM, the cache indicates a “miss.” A miss in a first-level cache normally triggers an L2 cache access, which requires a much larger number of processing cycles than a first-level cache access. A miss in the L2 cache triggers an access to main memory, which incurs an even larger delay.
The CAM comparison (e.g., determining whether or not an address hits in the cache) is relatively power efficient. However, retrieving instructions or data from the cache RAM in the event of a hit consumes a large amount of power. Accordingly, some processors utilize a prefetch operation to advantageously ascertain whether or not desired instructions are stored in an instruction cache, without incurring the power penalty of actually retrieving those instructions from the cache at that time. As used herein, the term “prefetch” or “prefetch operation” refers to a limited instruction cache access that yields a hit or miss, indicating whether or not one or more instructions associated with an instruction address are stored in the instruction cache, without retrieving the instructions from the cache if the address hits. That is, a prefetch operation accesses an instruction cache CAM, but not the RAM. As used herein, the term “fetch” or “fetch operation” refers to a memory operation that includes an instruction cache access that retrieves one or more instructions from the cache in the case of a cache hit. As discussed more fully herein, a fetch operation additionally accesses branch prediction circuits, such as a branch target address cache and branch history table, while a prefetch operation does not. It should be noted that both fetch and prefetch operations—which both perform instruction cache accesses—may take place in the same section of the processor pipeline.
Conventional instruction prefetching involves performing instruction cache hit/miss lookups based on sequential instruction addresses. For example, if a first instruction address causes an instruction cache miss, the L2 cache access time for that address may be utilized to calculate a second address, that of the next sequential cache line. Prefetching this second address ascertains whether the sequential cache line resides in the instruction cache. If it does not (i.e., the second address misses), an L2 cache fetch for the next sequential cache line may be initiated, effectively hiding it behind the access time for the first L2 cache access. On the other hand, if the next sequential cache line does reside in the instruction cache (i.e., the second address hits), the prefetch does not read the RAM, and no L2 request is initiated. At this point, the prefetch is deemed to have completed. The prefetch operation thus allows for overlapped L2 accesses if the address of the next sequential cache line misses the instruction cache, but does not incur the power cost of actually fetching the sequential instructions if the address hits. Prefetching sequential instruction addresses provides both performance and power management improvements when executing software that contains few or no branch instructions. However, prefetching sequential instruction addresses does not provide an advantage when executing software containing frequent branch instructions, since the instructions prefetched from sequential addresses are not likely to be executed due to the branches.