The basic components of a traditional data processing system include a digital processor element and a memory element, connected together by one or more buses. The data stored in the memory element is addressed by location. The memory may contain instructions that the processor is to execute, as well as the operands upon which the processor is to execute the instructions.
The processor is typically divided into two sub-elements, namely an instruction decode element and an instruction execution element. The instruction decode element causes data to be transferred to it from the memory element, which data it interprets as encoded instructions, or code. As directed by the instruction decode element, the instruction execution element may cause data to be transferred to and from the memory element, this data being interpreted as the operands upon which logical operations are to be performed.
The instruction decode element may be further divided into specialized functional units, where the basic units of an instruction decode element of a prior art instruction decode element include a decode unit, a fetch unit to take instruction fetch requests from the decode unit, and a memory controller unit that takes memory fetch requests from the fetch unit and generates specialized signals needed to operate the memory element. Each unit receives as input an "address" signal, and a "request" signal indicating that the instruction code at that address is to be returned. Each unit outputs a "done" signal that informs the requesting unit that the current request has been performed, and that the corresponding instruction code is available on the "data" signal. The instruction codes extracted from the memory element are accumulated by the memory controller in registers, then passed back to registers in the fetch unit, and finally passed back to registers in the decode element.
In many systems, the functional units operate together in a pipelined fashion, such that each one unit performs a particular step in the process required to complete a code fetch, and at any given time, each unit may be operating on an independent and distinct code fetch. Typically the pipeline units operate synchronously with respect to a global clock signal, having a constant period or cycle time. The units may require a different number of clock cycles to complete their functions, depending on the complexity of the function or some physical limitation of the operation. For example, the fetch unit may only require one clock cycle to respond to a fetch request from the decode unit, whereas the memory element may require many clock cycles to process a new address and return the appropriate data. When a unit is unable to respond to a request it is said to be "stalled".
Most instruction codes contain no information regarding the location of the next instruction to be executed. A convention is followed whereby such codes are fetched in a simple sequential order that corresponds to adjacent placement within the memory element. A dedicated address register in the decode unit, commonly called the program counter (PC) increments by some amount each time one of these instructions is decoded; the new value being the address of the next instruction requested from the fetch unit. This makes pipelining of instruction requests particularly effective since the fetch unit can anticipate future requests from the decode unit by simply incrementing the address value stored in its local fetch program counter (FPC) register. The fetch unit can therefore make speculative requests for instruction data before it is known to be needed.
A certain class of instructions cause the PC register to be loaded with a value that is different from the value that would have been obtained by incrementing it. These are commonly referred to as "jump" (or "branch") instructions. A jump instruction will cause a discontinuity in the sequential fetching of instructions codes. The discontinuity reduces the efficiency of the instruction decode element because some of the instruction data speculatively fetched by the fetch unit will not be used. Also, the performance of the instruction decode element is reduced because the decode unit must wait while the fetch request for the first instruction in the new instruction sequence is processed by the pipeline. Some jump instructions that are conditional may not actually cause a discontinuity in the fetch sequence, but may introduce stalled cycles while the jump condition is resolved. Other jump instructions introduce stalled cycles while the decode unit obtains necessary information from the execution element.
U.S. Pat. No. 4,742,451, issued to William F. Bruckert, et al, on May 3, 1988, discloses a modification to the basic pipelined decode element in which the time delay between detection of a conditional jump in the decode stage, and the delivery of the next instruction after the jump instruction is reduced. In the Bruckert patent, when a conditional branch instruction reaches the decode unit, the pipeline will contain instructions beyond the conditional branch, referred to as the "not taken" sequence. Since the decode unit does not immediately know the outcome of the branch condition, it initiates a fetch to the first instruction of the "taken" sequence using a second hardware resource, the operand fetch unit.
The timing of the operand fetch from the memory unit is controlled such that the decode unit will have resolved the branch condition by the time that the first "taken" sequence instruction is available to be latched in the instruction queue. If the decode unit determines that the branch is to be taken, it de-asserts an ABORT signal, allowing the first "taken" branch instruction to be loaded into the instruction register, thereby overwriting the previously saved instruction from the "not taken" sequence. However, if the decode unit determines that the branch is not to be taken, it asserts an ABORT signal to prevent the first "taken" branch instruction from being loaded into the instruction latch. It is characteristic of the apparatus described in U.S. Pat No. 4,742,451 to Bruckert that a memory access to the address of the instruction at the beginning of the "taken" sequence will be initiated and completed for every executed branch instruction. Also, the ABORT signal is only used to enable or disable the latching of a new instruction code in the decode element. Thus, un-needed memory accesses which translate to wasted clock cycles are still performed when using the Bruckert scheme. Accordingly, there is a need for a system in which wasted clock cycles consumed by unneeded memory accesses are minimized over the prior art.