Generally, computing systems such as personal computers, personal digital assistants, cellular and digital telephones, and other processor-based devices include data processors in the form of microprocessors for processing computer-readable instructions. A microprocessor is a single-chip data processor that includes an instruction decoder to decode the computer-readable instructions and one or more execution units to execute the decoded instructions. The execution units perform most of the actions responsible for application programs to function.
Modern microprocessors typically include several features to improve performance. One of these features is on-chip cache memory. Cache memory is a high-speed local memory that utilizes the locality of instruction fetching (in the case of an instruction cache) or data references (in the case of a data cache) to prevent pipeline stalling due to the relatively slow access time of main memory.
While the use of instruction caches in particular has greatly improved microprocessor performance, some performance obstacles remain. One of these is the fact that a cache access itself requires a certain amount of time, even if the instruction fetch hits in the cache. If as is common the microprocessor uses memory management and the cache is a physical cache storing data corresponding to physical addresses, any cache access requires an address translation step. Another obstacle is superscalar design, which allows microprocessors to issue more than one instruction per cycle, thereby increasing the demand for instructions to be returned from the cache. As a result of these conditions if an instruction fetch misses in the cache, the instruction pipeline may be stalled due to instruction starvation while the cache fetches the requested instruction from relatively slow main memory.
Thus some high-end microprocessors have started to use a feature that was common in early mainframe computers. This feature is the use of prefetch buffers for fetching instructions. A prefetch buffer is a set of registers that store instructions that have been pre-loaded from the cache or from main memory in a first-in, first-out (FIFO) fashion. Prefetch buffers prevent instruction starvation that might otherwise occur during cache or main memory accesses, but have limitations of their own. One limitation is that some microprocessors support variable length instructions, which may cause inefficient usage of the prefetch buffer.
Another limitation is that particular types of instructions, such as branch instructions, may result in a change of flow for the data processor. By the time such an instruction is decoded and recognized to be a branch instruction by the instruction decoder, multiple instructions following the branch instruction may have been fetched and loaded into the prefetch buffer that will not be needed if the branch is taken. Consequently, instructions may be loaded from memory only to be subsequently discarded from the prefetch buffer. The discarded instructions represent wasted power in the form of unnecessary fetch operations.