The invention relates generally to computer systems and deals more particularly with an improved technique for executing branch instructions.
In a computer program, branch instructions are encountered frequently and are executed in various ways. U.S. Pat. No. 5,070,475 to Normoyle et al discloses a data processing system which includes a floating point computation unit (FPU) which interfaces with a central processing unit (CPU). The CPU supplies a dispatch control signal to inform the FPU that it is about to execute a floating point microinstruction and supplies a dispatch address which includes the starting address of the floating point microinstructions during the same operating cycle that the dispatch control signal is supplied. A buffer memory is provided in the FPU to store the starting address of one decoded macroinstruction while a sequence of microinstructions for a previously decoded macroinstruction is being executed by the FPU.
U.S. Pat. No. 5,070,475 also discloses interface logic which handles suitable control signals for permitting asynchronous operation of the FPU and the CPU and which utilizes a single level of pipelining macroinstructions for initiating FPU operations. Suitable control signals are used in order to permit the transfer of FPU instruction information and to arrange for the proper loading and subsequent use thereof by the FPU. Further control is required to assure that the CPU does not transfer an FPU instruction when the single buffer pipeline at the FPU is full and unable to accept the FPU instruction.
U.S. Pat. No. 5,070,475 also discloses control signals which provide for the transfer of data in either direction between the CPU data bus and the FPU data bus. Moreover, other control signals are provided for handling floating point faults which may occur during the calculations being executed by the FPU.
U.S. Pat. No. 4,509,116 to Lackey et al. discloses an interconnection arrangement between a CPU and an FPU (called a "special instruction processor"). The CPU retrieves all of the microinstructions from the memory in series and decodes the instruction. An image of the instruction is passed to the FPU. When an instruction is received which requires processing by the FPU, then the CPU retrieves the data words comprising the operand from the memory and passes them to the FPU. After receiving the instruction, the FPU also decodes the instruction and proceeds to receive the data words comprising the operand of the instruction. The FPU then processes the operand in a conventional manner and prepares to transmit back to the CPU the results of the processing, i.e. the processed data and any condition codes. When the CPU is signalled by the FPU that it has finished processing, it signals the FPU to transmit the data. The CPU is then able to transmit the processed data back into storage in the memory.
U.S. Pat. No. 4,683,547 to DeGroot teaches a data processing system which includes a multiple floating point arithmetic unit with a putaway and a bypass bus. The FPU includes a new instruction for handling multiple multiply and divide instructions. These instructions include passing the results of each multiple/divide on a bypass bus to the input of an adder along with the inputs from an accumulate bypass bus which is the output from the adder for an automatic add operation on an accumulate multiply or accumulate divide operation. This allows two floating point results to be produced in each cycle, one of which can be accumulated without any intervening control by the CPU.
U.S. Pat. No. 4,654,785 to Nishiyama et al. discloses an information processing system having a plurality of arithmetic units such as a general instruction arithmetic unit or CPU and a floating point instruction arithmetic unit or FPU. The information processing system includes means provided for each of the arithmetic units which generates a condition code for use in branch judgement of a conditional branch instruction. Within each arithmetic unit, branch judgement means are provided which judge the success or failure of a branch of the conditional branch instruction by using the condition generated by the code generating means. A judgement unit decision circuit is also provided which is responsive to the operation state of each arithmetic unit for generating an instruction signal indicating which of the branch Judging means is to be operated to and supply the instruction signal to the branch judgement means, whereby branch control is carried out by using as a valid result either one of the branch judgement results obtained in the respective arithmetic units.
An article entitled "Repeating Microcode Words for Fast Controlled Repeat Cycle Functions", IBM Technical Disclosure Bulletin, vol 32, no 5B, October 1989, pp 403-404, teaches a repeat cycle enabling function in a microprogram controlled processor. In the disclosure, a microword control latch is set as each of the looping microcontrol words is being executed. This latch controls the gating of the micro-control words into the control register. If the latch is ON, the control word clocked into the control register at the beginning of the next cycle will be from the output of the current control register. If the latch is OFF, the control word clocked into the control register at the beginning of the next cycle will be the output of the control storage.
An article entitled "Zero-cycle Branches in simple RISC Designs", IBM Technical Disclosure Bulletin, vol 33, no 10B, March 1991, pp 253-259, teaches a method of reducing the pipeline delay in a RISC system by providing a branch execution unit which executes the branches without interrupting with or using standard fixed point instruction resources. The branch execution unit attempts to make branches all but invisible to the fixed-point and floating-point execution units. Software support is needed in order to allow the operation of the branch execution unit.
In high end machines, a number of methods are known in which the number of cycles required to carry out a branch instruction are reduced to either zero cycles or one cycle. In general, these allow the next branch cycle to be processed while previous instructions are being executed. Assuming that the previous instructions neither affect the fulfillment of the branch condition nor the generation of the address to which the microprogram Jumps, then the branch condition will be calculated and the address of the next instruction to be processed placed into the instruction buffer. Such implementations require a higher amount of computer power and extra circuitry to control the parallel data flows. In addition it may not be possible to provide downward compatibility with existing microcode sequences.
A general object of the present invention is to produce a more efficient method for the execution of branch or loop instructions.