1. Technical Field
The present invention relates in general to data processing and, in particular, to a processor and method for fetching instructions. Still more particularly, the present invention relates to a processor and method for fetching blocks of instructions in response to a detected block sequence.
2. Description of the Related Art
A typical processor for a computer system comprises a digital integrated circuit including, for example, one or more execution units for executing sequential instructions, a branch unit for processing branch instructions, and registers for storing instruction operands and result data. The processor further includes an instruction cache for storing instructions and instruction sequencing logic for fetching instructions from the instruction cache and routing them to the various execution units for execution.
In a conventional processor, the instruction sequencing logic includes a sequential fetcher that, during each processor cycle, generates an effective address corresponding to a next cache line of instructions that sequentially follows the previously fetched cache line. For example, assuming the instruction cache has 8-byte cache lines, the sequential fetcher generates an effective address corresponding to a next sequential cache line by incrementing the previous effective address by 8. In the absence of a branch in program flow (e.g., due to a branch instruction), this effective address is translated into a real address and then supplied to the instruction cache to fetch the next sequential cache line of instructions.
In addition to such sequential fetching, the instruction sequencing logic may also fetch instructions non-sequentially. In particular, the branch unit may compute non-sequential branch target addresses in response to processing branch instructions in the instruction stream. Many processors also generate speculative non-sequential branch target addresses by predicting the outcome of conditional branch instructions. Such non-sequential branch target addresses are translated into real addresses and supplied to the instruction cache to fetch a next non-sequential cache line of instructions. Once fetched, the non-sequential instructions, which in cases of branch prediction may be speculative, can be executed by the processors execution units. Of course, processors that allow speculative execution of fetched instructions must also include some recovery mechanism in case the branch prediction is later determined to be incorrect.
For both sequential and non-sequential fetching, if the fetch address misses in the instruction cache, the requested instructions must be loaded from a lower level cache associated with the requesting processor, a cache associated with another processor, or main memory. If requested instructions cannot be loaded and supplied to the processor""s execution units rapidly enough, the execution units of the processor may be idle for one or more cycles, thus degrading processor performance.
In view of the foregoing, instruction fetching becomes a more important performance consideration as processor clock frequencies increase since it is imperative, in order to exploit the full performance capability of a processor, for the processor""s execution units to be supplied with instructions to execute. The conventional instruction fetching methodology described supra, which is referred to herein as instruction-level fetching, may not be able to provide an adequate supply of instructions for execution in some processor architectures because it is constrained to fetch a single cache line of instructions at a time. Thus, if a fetch request misses in the instruction cache, the processor may execute all previously fetched instructions prior to a next cache line of requested instructions being loaded from a lower level cache, for example.
Accordingly, the present invention provides an improved data processing system and method for fetching instructions. Rather than fetching only a single cache line of instructions in response to a generated fetch address, the processor of the present invention intelligently fetches one or more non-sequential blocks of instructions at a time from a memory. The present invention determines which blocks of instructions to fetch based upon hardware detection of a program""s control flow graph (CFG), that is, the sequence in which the instruction blocks comprising the program are executed. If a portion of a previously observed sequence of instruction blocks is detected, one or more additional instruction blocks in the sequence are fetched. Thus, the instruction blocks following a currently executing instruction block will be available for rapid access by the processor.
In preferred embodiments, a data processing system implementing the present invention includes at least one execution unit that executes fetched instructions and instruction sequencing logic that fetches instructions from a memory. In response to detection of an instruction trigger within an instruction stream, the instruction sequencing logic fetches one or more non-sequential blocks of instructions from memory, where each of the non-sequential blocks includes a plurality of instructions.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.