A microprocessor is a single chip (known as an integrated circuit or “IC”) that is designed to perform arithmetic and logic operations on values stored in registers. Typical microprocessor operations include adding, subtracting, comparing two numbers, and fetching numbers from one area to another. These operations are the result of a set of instructions that are read and processed by the microprocessor. For example, when a computer is turned on, the microprocessor of the computer is designed to retrieve a set of instructions from the basic input/output system (BIOS) that comes with the computer as part of its memory. After that, the microprocessor may read and process instructions from the BIOS, the operating system that the BIOS loads into computer memory, or an application program.
Each instruction may be referenced by an address. The microprocessor keeps track on which instruction the microprocessor is currently processing using a program counter, abbreviated as PC. A PC is a register in the control unit of the microprocessor that is used to keep track of the address of the current or next instruction, depending on how the microprocessor is implemented. Typically, the program counter is advanced to the next instruction, and then the current instruction is executed.
While the architecture of a microprocessor may differ from implementation to implementation, there are several features common to all microprocessors. A description of the common features found in most microprocessors is provided with reference to FIG. 1, which is a block diagram of an illustrative microprocessor according to one approach. As shown in FIG. 1, several components are involved in the operation of a microprocessor 100. A microprocessor 100 may comprise an instruction execution component 110, an instruction buffer 120, and memory, such as caches 130 and 132. In addition, a microprocessor 100 may interact with memory external to the microprocessor 100, such as main memory 140 and cache 134. A brief description of the operation of the components of FIG. 1 shall now be presented.
The instruction execution component 110 of a microprocessor processes the instructions read by the microprocessor. For simplicity, an instruction execution component 110 of a microprocessor 100 shall be referred to herein as the core 110.
A microprocessor 100 may also contain one or more instruction buffers, such as instruction buffer 120. An instruction buffer is a buffer that temporarily holds one or more instructions until another component of the microprocessor is ready to receive those instructions. For example, the microprocessor 100 may read an instruction from memory and transfer the instruction to the instruction buffer 120. The instruction buffer 120 holds instructions read from memory until the core 110 is ready to process those instructions.
The instructions read by the microprocessor 100 may initially be stored in main memory 140. To reduce the amount of time it takes the microprocessor 100 to read instructions, instructions may also be stored in a type of memory called a cache. Caches are designed to reduce the amount of time required to retrieve instructions from main memory 140. A microprocessor 100 may have more than one cache, and caches may reside on the microprocessor (such as cache 130 and 132) or off the microprocessor (such as cache 134).
When the core 110 needs to process a particular instruction for processing, the microprocessor 100 may initially attempt to load the instruction from the L1 (Level 1 or first level cache) cache 130. If the requested instruction is not in the L1 cache 130, then the microprocessor 100 attempts to obtain the instruction from the L2 (Level 2 or second level cache) cache 132. If the requested instruction is not in the L2 cache 132, then the microprocessor 100 attempts to obtain the instruction from the L3 (Level 3 or third level cache) cache 134. If the requested instruction is not in the L3 cache 134, then the microprocessor 100 attempts to obtain the instruction from main memory 140. In this way, the memory of the microprocessor 100 is arranged in a hierarchy. The microprocessor initially checks the lowest level of the memory hierarchy (the L1 cache 130), and if the requested instructions is not found, the microprocessor 100 checks each higher level of memory, in order, for the requested instruction, until the instruction is located.
Typically, a lower level of memory is faster to access than a higher level of memory, but the lower level of memory can store fewer instructions than a higher level of memory. To illustrate, in a typical implementation, (a) the L3 cache 134 can store fewer instructions than main memory 140, but the microprocessor 100 can access the L3 cache 134 faster than main memory 140, (b) the L2 cache 132 can store fewer instructions than the L3 cache 134, but the microprocessor 100 can access the L2 cache 132 faster than the L3 cache 134, and (c) the L1 cache 130 can store fewer instructions than the L2 cache 132, but the microprocessor 100 can access the L1 cache 130 faster than the L2 cache 132.
Individual instructions may be stored in a set of instructions referred to as a line of instructions. For example, a line of instructions may comprise 8 individual instructions. A line of instructions is typically stored in a continuous portion of memory. In some implementations, instructions buffers and memory (such as main memory and the various caches accessed by a microprocessor) may store instructions in units of lines of instructions, rather than individual instructions. Thus, instead of storing an individual instruction into a cache, an entire line of instructions may be stored into the cache. In practice, when a individual instruction is needed for processing by the core, and the instruction needs to be retrieved from higher levels of memory since it is not stored within the L1 cache 130, the entire line of instructions containing the individual instruction, rather than simply the individual instruction, is retrieved from higher levels of memory and stored within the L1 cache 130.
It is advantageous to avoid retrieving instructions from higher levels of memory due to the latency involved in requesting instructions from higher levels of memory. Consequently, what is needed is an approach for retrieving instructions from memory that minimizes requesting instructions from higher levels of memory. The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.