Modern processors, such as a simultaneous multithreading (SMT) processor, may include an instruction cache and a prefetch buffer. The instruction cache may include an array of real addresses and an array of associated instructions. The prefetch buffer may be similarly configured though the arrays are typically on a much smaller scale. The prefetch buffer may be configured to store real addresses and the associated instructions for prefetched instructions. Prefetched instructions may refer to instructions fetched from memory, e.g., main memory, prior to the time the instructions are requested by the processor. The instructions fetched from memory, e.g., main memory, to be stored in the prefetch buffer may be speculatively prefetched based on the principle that if a memory location is addressed by the processor, the next sequential address will likely be requested by the processor in the near future. The prefetched instructions in the prefetch buffer may be speculatively prefetched in response to a speculative request as discussed below.
Speculatively fetching instructions may occur when a processor speculates as to whether a received branch instruction will be taken or not taken based on some prior history. If a branch instruction is predicted to be taken, then the flow of the program is altered, i.e., the sequence of instruction execution is altered. If the branch instruction is predicted to not be taken, then the following sequential instructions are executed. In either case, the stream of instructions executed are said to be “speculatively” executed. If the branch is predicted incorrectly, i.e., the processor predicted incorrectly as to whether a branch instruction will be taken or not, the speculatively executed instructions are flushed.
Upon speculating as to whether a received branch instruction will be taken or not taken, or upon flushing a sequence of speculatively fetched instructions that were predicted incorrectly, an SMT processor may fetch a sequence of speculative or non-speculative addresses from the program counters which may be used to index into the instruction cache. Typically, a hash of this address (referring to either using the value stored in particular bits of the address from the program counter or using an algorithm that may generate a different value and number of bits from the address fetched from the program counter) is performed and the hash is used to index into the instruction cache. Further, a hash (may be a different hash than the hash used to index into the instruction cache) of the address from the program counter may be used to index into the prefetch buffer. In order to determine whether the instructions stored in the instruction cache or in the prefetch buffer should be selected, an address translation of the address from the program counter may be performed to translate the address into the corresponding real address. The real address may refer to the address in physical memory. The translated real address may be compared with the real address in the indexed entry in the instruction cache to determine whether the instructions in the instruction cache should be selected. If the translated real address is equal to the indexed real address in the instruction cache (referred to as a cache hit), then the instructions in the instruction cache are selected. If the translated real address is not equal to the indexed real address in the instruction cache (referred to as a cache miss), then the instructions in the prefetch buffer are selected. However, the determination as to whether there is a cache hit or miss takes longer than a clock cycle. Hence, in the case of a cache miss, the selection of the instructions in the prefetch buffer takes longer than a clock cycle. It is noted that the selection of the instructions in the instruction cache or in the prefetch buffer may be in error. Hence, a determination may later be made as to whether the instructions selected in either the instruction cache or the prefetch buffer were the appropriate instructions to be selected. If there was an error in the selection of the instructions in either the instruction or the prefetch buffer, the appropriate instructions may be fetched from main memory.
It takes longer than a clock cycle to determine if there is a cache hit or miss since the steps of translating the address from the program counter into its corresponding real address and then comparing the translated real address with the indexed real address in the instruction cache to determine if there is a cache hit or miss takes longer than a clock cycle. Hence, there is a one cycle lag in selecting the instructions from the prefetch buffer if there is a miss in the instruction cache. By taking an extra cycle to select the instructions from the prefetch buffer in the case of a miss in the instruction cache, processor performance is hindered.
Therefore, there is a need in the art to select the instructions in a prefetch buffer in the event of a miss in the instruction cache with a zero cycle penalty, i.e., within a single clock cycle.