1. Field of the Invention
The invention relates to high-performance computers and particularly to computer processors using an instruction pipeline and including an instruction which loads a plurality of registers.
2. Background Art
The high-performance processors using instruction pipelines are well known. Typically, in such processors an instruction defining specific actions to be performed by the processor is executed in a sequence of steps or phases. Within each phase, a portion of the instruction is processed by the processor hardware.
In one known pipeline system, the instructions are divided into a decode phase, an address transfer phase, an operand fetch phase and an execution phase. In the decode phase, the instruction is decoded and address information is used to generate addresses needed in the execution of the instruction. In the address transfer phase, the addresses are transferred internal to the machine to registers where they are needed in the next phase. In the operand fetch phase, the addresses are used to fetch operands, such as used for example in mathematical operations. In the execution phase, the operands fetched in the operand fetch phase and information generated into the decode phase are used to execute the instruction, for example by the operation of an arithmetic logic unit.
In a pipeline system, these various phases of instruction execution are performed sequentially for any one instruction but on an overlap basis for several sequentially occurring instructions in independently operating portions of the processor hardware. Thus, an instruction is fetched from memory and decoded in a first phase in one part of the processor while other phases of other instructions are executed in other parts of the processor. The hardware for the various phases is interconnected such that information created in one part of the processor in an earlier phase is communicated to another part of the processor for use in a later phase of the same instruction. The instruction execution hardware may be conveniently divided into an instruction section (I-unit) and an execution section (E-unit). The function of the I-unit is to decode the instruction and set up the internal hardware for completion of the instruction under control of the E-unit.
A well-recognized problem with pipeline systems is interference between successively executed instructions. One particular situation where this occurs is where a second instruction being decoded requires data in its decoding phase and the data is to be generated by a prior instruction which has not yet been fully executed. Interference detection circuitry typically compares information defining data to be used in a next instruction with the identity of data being modified by the previous instruction. If it appears that there will be a conflict, the decoding of the next instruction is delayed until the previous instruction has been fully executed. Such delays significantly reduce the efficiency of the processor. One particular instruction which tends to be the source of interference is an instruction used in many processors known as the Load Multiple Register (LMR) instruction. This instruction is used to move data from memory into a group of registers within the machine commonly known as the general registers. These registers are used by a variety of different instructions and it is common to load a number of these registers in anticipation of execution of a sequence of instruction which will use the contents of these registers. The LMR instruction typically requires several execution cycles and it is quite common that a next instruction following after the LMR instruction uses one of the general registers being loaded by the LMR instruction. In the prior art, the identity of the highest number and lowest number general registers to be loaded are recorded in the decode phase of the LMR instruction. A determination is made during the decode phase of the next sequential instruction as to whether a general register to be used in the decode phase of the next instruction is included in the range of registers to be changed by the LMR instruction. If so, the instruction processing for next sequential instruction is delayed until the last register involved in the LMR instruction has been loaded.
Many processors read data from memory in double words, e.g., eight bytes, and process double words in the internal data flow of the processor. The general registers, however, are typically single word, e.g., four bytes, registers. In executing the LMR instruction, two general registers are written in each cycle of the execution phase of the LMR instruction and in such a manner that a highest numbered one of the two registers is written from the lower order bytes of the double word data flow. Circuitry is typically provided which allows information written to a general register in the last execution cycle to be made simultaneously available to the I-unit for the operand fetch cycle of the next sequential instruction. With register modifying instructions, other than the LMR instruction, only one or two general registers are changed. In that case, when a previous instruction alters a register the identity of a modified register is written in the target register. The target register is further provided with an indication as to whether any general registers are changing, whether that general register and the next general register are changing or whether no general registers are changing. The contents of the target register is used to determine whether a delay needs to be introduced in the decode or operand fetch phase of the next instruction.
Many processors read double data words from memory and process double data words in the internal data flow but most instructions deal only with the lower order half of the data word. If the execution phase results of one instruction in one portion of the double word are needed in the operand fetch phase of the next instruction in the same portion of the double word, the results are made available for the next instruction during the execution phase of the prior instruction. In that case, the register is said to be by-passable. However, if the data in one portion of a double word are needed in another portion of the double word by the next instruction, the register is said to be not by-passable condition and the operand fetch phase of the next instruction be delayed until after completion of the execution phase of the previous instruction. This particular arrangement has been used whenever a previous instruction modifies the contents of one or two general registers. However, this arrangement has not been found to be applicable to the LMR instruction where a larger number registers are modified by a single instruction.