The present invention relates to the field of high performance computer processors, and more particularly, to the instruction set architecture of a processor and methods for improving programming flow.
As the operating frequencies of processors continues to rise, performance often depends upon providing a continual stream of instructions and data in accordance with the computer program that is executing. As application programs continue to get larger, instruction fetch penalty has become one of the major bottlenecks in system performance. Instruction fetch penalty refers to the number of cycles spent fetching instructions from different levels of cache memories and main memory. Instruction prefetch is an effective way to reduce the instruction fetch penalty by prefetching instructions from long-latency cache memories or main memory to short-latency caches. Therefore, when instructions are actually required the fetch penalty of the instructions is small.
Since a prefetch needs to be performed before the program actually reaches the prefetch target, it is important for the instruction prefetch mechanism to acquire the correct instructions. One common prior art prefetch method is to simply have the processor prefetch instructions a certain number of instructions ahead of the current instruction being executed. While this works well for instructions that lie along a single execution path, a branch to another execution path renders useless the prefetched instructions occurring after the branch instruction.
As the art of computer design has progressed, there has been a trend to design processor mechanisms capable of finding ways to keep the functional units of a processor busy even when it is not certain that the work performed by the functional unit will be needed. For example, branch prediction allows execution along a predicted path to begin even though the condition tested by the branch instruction has not yet been determined. Initially, many of these techniques were provided solely by the processor hardware, and were invisible to the program being executed.
More recently, there has been a trend to expose these mechanisms to the program, and thereby allow compilers to generate program code that is capable of exploiting the mechanisms more efficiently. One such mechanism was disclosed in U.S. Pat. No. 5,742,804 to Yeh et al., which is entitled xe2x80x9cInstruction Prefetch Mechanism Utilizing a Branch Predict Instructionxe2x80x9d and was incorporated by reference above. Yeh et al. disclosed a branch prediction instruction that also prefetched instructions along the predicted path. Therefore, Yeh et al. allowed the prefetching mechanism of the processor to be exposed to the compiler to the extent that the compiler could direct prefetching activity associated with a predicted branch path.
The present invention provides a prefetch instruction for prefetching instructions into one or more levels of cache memory before the instructions are actually encountered in a programmed sequence of instructions, thereby minimizing instruction fetch penalty while making optimum use of memory bandwidth by only prefetching those instructions that are likely to be needed. According to one embodiment of the invented method, a prefetch instruction is executed. The prefetch instruction is defined by an opcode that specifies a target field and a count field. A block of target instructions, starting at the target address and continuing until the count is reached, is prefetched into the instruction cache of the processor so that the instructions are available for execution prior to the execution of the instruction specified by the target address. In other embodiments, the prefetch instruction of the present invention includes a cache level field, a flush field, and a trace field. The trace field specifies a vector of a path in the program sequence that leads from the prefetch instruction to the target address, and allows the prefetch operation to be aborted if the vector is not taken. The cache level field specifies the level of the cache memory into which the instructions are to be prefetched. Finally, the flush field indicates whether all preceding prefetch operations should be discarded.
Architecturally, the prefetch instruction is effectively a xe2x80x9cno-operationxe2x80x9d (NOP) instruction, and has no effect on program execution other than providing performance benefits. The present invention exposes the prefetch mechanism of the processor to the compiler, thereby increasing performance. By allowing the compiler to schedule appropriate prefetch instructions, the present invention reduces latency by increasing the likelihood that instructions will be in the cache when they are executed, while reducing cache pollution and conserving bandwidth by only prefetching instructions that are likely to be executed.