The present invention relates in general to a data processing system, and in particular, to instruction prefetch in a data processing system.
As computers have been developed to perform a greater number of instructions at greater speeds, many types of architectures have been developed to optimize this process. For example, a reduced instruction set computer (RISC) device utilizes simpler instructions and greater parallelism in executing those instructions to ensure that computational results will be available more quickly than the results provided by more traditional data processing systems. In addition to providing increasingly parallel execution of instructions, some data processing systems employ memory devices within the processor to permit retrieval of instructions from a system memory before they are required for execution by the processor. A set of instructions is loaded from a system memory device into this processor memory, the so-called cache or level 1 (L1) cache for subsequent dispatching to execution units within the processor. The set of instructions loaded from memory includes a sufficient number of instructions to fill a block of cache memory of predetermined size, a xe2x80x9ccache line.xe2x80x9d
Fetching units first look to the cache for the next instruction it needs. If the instruction is not in the cache, a xe2x80x9ccache miss,xe2x80x9d the fetching unit must retrieve the instruction from the system memory. As processor clock rates increase more rapidly than memory access times do, the latency penalties from a cache miss increase accordingly.
Memory latency due to a cache miss may be reduced by prefetching an instruction cache line from a system memory device. However, if an instruction that alters an instruction sequence path is executed, the prefetched cache line may not be used. That is, an instruction, such as a branch, may cause a jump to an instruction path that is outside the prefetched cache line. Prefetching a cache line that later is unused leads to xe2x80x9ccache pollutionxe2x80x9d that reduces the effectiveness of the prefetching.
To reduce instruction cache pollution due to prefetching, restrictions have been placed on the fetch process. One restriction used in many implementations is to delay fetching a cache line until a fetch request is made which causes an instruction cache miss. In other words, a miss request for the subsequent cache line from a system memory device will not be initiated until an instruction queue which receives instructions from the instruction cache has sufficient room to hold the remaining instructions in the current instruction cache line. Other implementations do not allow a miss request to be sent to a bus controller to retrieve the next cache line from a system memory device until it is known that there are no outstanding instructions in the current cache line that will change the instruction path. In either case, the efficacy of prefetch mechanisms in reducing the latency penalty from a cache miss is reduced by the restrictions placed thereon.
Restrictions placed on instruction prefetching delay the prefetch and thereby reduce the effectiveness of prefetching in reducing cache miss penalties. Therefore, there is a need in the art for a prefetch mechanism that permits cache miss requests to be issued earlier without increasing cache pollution.
The previously mentioned needs are addressed by the present invention. Accordingly, there is provided in a first form, a method of reducing cache miss penalties. The method includes determining if a current instruction changes an instruction execution path. Then, in a next step, if the instruction does not change the instruction path outside of a next sequential cache line, a next sequential cache line is prefetched if no remaining instruction in a current cache line changes an instruction execution path outside of the next cache line.
Additionally, there is provided, in a second form, an apparatus for reducing instruction cache miss penalties. The apparatus includes an instruction storage device for receiving a plurality of instructions from at least one memory device. A predecode device predecodes a portion of each instruction, and outputs a predecode bit associated with each instruction to the instruction storage device. The apparatus also includes circuitry for generating a prefetch data value from one or more of the predecode bits. A fetch logic device fetches instructions from a memory device for loading into the instruction storage device. The fetch logic device prefetches a next sequential instruction set in response to a predetermined value of the prefetch data value.
Finally, there is provided, in a third form, a data processing system that includes at least one memory device and an instruction storage device for receiving a plurality of instructions from the one or more memory device. A predecode device predecodes a portion of each of the instructions, and outputs a predecode bit associated with each instruction to the instruction storage device. A fetch logic device for fetching instructions from the one or more memory devices, prefetches a next sequential instruction set for loading into the instruction storage device in response to the predecode bits.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.