1. Technical Field
The present invention relates in general to an improved data processing system, and in particular to an improved method and system for cache memory management. Still more particularly, the present invention relates to a method and system to reduce fetching delays caused by the inaccessibility of cache during block transfers to cache memory.
2. Description of the Related Art
A pipelined processor is one in which the processing of an instruction is divided into discreet stages. Because the processing of an instruction is broken into a series of stages, an instruction does not require the entire resources of an execution unit which executes the instruction. For example, after an instruction completes a decode stage, it can pass onto the next stage, while the subsequent instruction can advance into the decode stage. Pipelining improves the throughput of the instruction flow. For example, it may take three cycles for a floating-point instruction to complete, but if there are no stalls in the floating-point pipeline, a series of floating-point instructions can have a throughput of one instruction per cycle.
A superscalar processor is one that issues multiple independent instructions into multiple pipelines allowing instructions to execute in parallel. Typical execution units may include: an integer unit (IU), a floating-point unit (FPU), a branch processing unit (BPU), a load/store unit (LSU), and a system register unit (SRU). FIG. 1 is an example of an embodiment of a superscalar data processing system. The embodiment of the superscalar data processing system shown in FIG. 1 is similar to that sold by International Business Machines Corporation under the trademark "PowerPC."
In FIG. 1, superscalar data processing system 100 includes five independent execution units and two register files. The five independent execution units include: branch processing unit 102, load/store unit 104, integer unit 106, and floating-point unit 108. Register files include: General purpose register file (GPR) 107 for integer operands, and Floating-point register file (FPR) 109 for single- or double-precision operands.
As shown in FIG. 1, instruction unit 110 contains sequential fetcher 112, instruction queue 114, dispatch unit 116, and branch processing unit 102. Instruction unit 110 provides centralized control of instruction flow to the execution units. Instruction unit 110 determines the address of the next instruction to be fetched based on information from the sequential fetcher 112 and branch processing unit 102.
Sequential fetcher 112 fetches instructions from instruction cache 118 and loads such instructions into instruction queue 114. Branch instructions are identified by sequential fetcher 112, and forwarded to branch processing unit 102 directly, bypassing instruction queue 114. Such a branch instruction is either executed and resolved (if the branch is unconditional or if required conditions are available), or is predicted. Non-branch instructions are issued from instruction queue 114, with the dispatch rate being contingent on execution unit busy status, rename and completion buffer availability, and the serializing behavior of some instructions. Instruction dispatch is done in program order. BPU 102 uses static branch prediction on unresolved conditional branches to allow instruction unit 110 to fetch instructions from a predicted target instruction stream while a conditional branch is evaluated. Branch processing unit 102 folds out branch instructions for unconditional branches or conditional branches unaffected by instructions in progress in the execution pipeline.
Instruction queue 114 holds several instructions loaded by sequential fetcher 112. Sequential fetcher 112 continuously loads instructions to keep the space in instruction queue 114 filled. Instructions are dispatched to their respective execution units from dispatch unit 116.
In operation, instructions are fetched from instruction cache 118 at a peak rate of two per cycle, and placed in either instruction queue 114 or branch processing unit 102. Instructions entering instruction queue 114 are issued to the various execution units from instruction queue 114. Instruction queue 114 is the backbone of the master pipeline for superscalar data processing system 100, and may contain, for example, a six-entry queue. If while filling instruction queue 114, a request from sequential fetcher 112 misses in instruction cache 118, then arbitration for a memory access will begin.
The timing of the instruction fetch mechanism in superscalar data processing system 100 depends heavily on the state of on-chip instruction cache 118. The speed with which the required instruction is returned to sequential fetcher 112 depends on whether or not the instruction being asked for is in the on-chip instruction cache 118 (a cache hit) or whether a memory transaction is required to bring the data into instruction cache 118 (a cache miss).