1. Field of the Invention
The present invention relates to a branching control system, particularly a branching control system in a data processing unit having pipeline control where the amount of processing necessary for cancellation of one branch when successful branching has occurred is reduced by generating a branching address and fetching an instruction at a branch-to address while executing condition testing of the branching instruction.
2. Description of the Related Art
In recent large scale computer systems, it has become common for data to be processed quickly using a pipeline type system. High speed processing utilizing the characteristics of a pipeline can be used for processing ordinary instructions that flow sequentially. However, with respect to branching instructions, the successive instructions already fetched and residing in the pipeline may have to be invalidated when the branching conditions are finally determined, generating lost-cycles. Particularly, for data base processing operations or the online data processing operations which are the data processing operations receiving increased attention, the ratio of executing branching instructions to non-branching instruction is large and the lost-cycles generated in the pipeline system are high in number and therefore a pipeline processing system which can reduce the number of lost-cycles is greatly desired.
Operations by prior art systems, when an index high branching (hereinafter referred to as BXH) instruction or the index low equal branching (hereinafter referred to as BXLE) instruction are executed in the data processing system which executes instructions in a pipeline, are explained below. The BXH instruction and BXLE instruction are of the register storage (RS) type as shown in FIG. 1, where OP is the operation code area; R1, R3, B2 are register designation areas. The contents of general purpose register R1 is the first operand, the contents of general purpose register R3 is the third operand and the value obtained by adding a displacement value of D2 to the contents of general purpose register B2 becomes the second operand.
The BXH instruction branches to the address indicated by the second operand when an increment of the third operand is added to said first operand (hereinafter referred to as an index) and the result is larger than a comparison number designated by the contents of the R3 register area. Even when the branch is not taken, the content of the first operand is updated by the increment. The increment is the content (third operand) of the general purpose register designated by R3 as mentioned above. The comparison number is in the next general purpose register having an odd number, for example when R3 indicates an even numbered register, the comparison number is the content of general purpose register indicated by the content of R3+1, and when R3 indicates an odd numbered register, the comparison number is the content of the general purpose register indicated by R3.
It is also possible that the general purpose register includes the same comparison number as the first operand and in this situation an initial value before adding the increment is used as the comparison number.
The BXLE instruction is the same as said BXH instruction, except that the conditions for branching are inverted and branching is carried out when the index is smaller than or equal to the comparison number.
The pipeline operations in the prior art system for executing the BXH and BXLE instructions are explained with reference to FIG. 2. In FIG. 2. P1, P2, P3, P4, P5 and P6 are pipeline stages and the general operations thereof are as follows:
Stage P1: performs decoding of instructions and from general purpose registers; PA1 Stage P2: performs calculation of operand address and generation of buffer memory access request; PA1 Stage P3: performs conversion of logical address to real address by address conversion buffer; PA1 Stage P4: performs a buffer operation for operand memory access; PA1 Stage P5: performs an arithmetic operation; and PA1 Stage P6: performs writing of the operation results to the general purpose registers. PA1 Stage Il: sends the generated branching address to the address conversion buffer; PA1 Stage I2: converts a logical address to a real address using the address conversion buffer; PA1 Stage I3: accesses the buffer memory using said real address; and PA1 Stage P1: reads an instruction from buffer memory, decoding it and reads operands from registers.
Execution of branching instructions using a pipeline is generally performed in such a manner that one instruction can develop one or a plurality of flows, the original flow and the branch flow.
The construction of the prior art system for performing the processing shown in FIG. 2 is different from the present invention shown in FIG. 6 but the systems are substantially the same in their essential parts for the purpose of explanation in this application. Here, the pertinent portions are briefly explained and the differences will be explained later. The stages P1 to P6 shown in FIG. 6 are the same as the stages mentioned above and the stages Il, I2 and I3 shown in FIG. 6 are stages for instruction prefetch for the pipeline and operations in each stage are as follows:
The address conversion buffer may be considered either as part of the cache memory 16 or 20, or in a preceding stage (not shown).
An address of an instruction to be executed is loaded into instruction address register (hereinafter referred to as IAR) 11, the instruction is read using the effective address register (hereinafter referred to as EAR) 15 from the cache memory 16 and is loaded into the instruction word register (hereinafter referred to as IWR) 17. The IWR 17 generally is a shift register with plural stages and a plurality of instructions to be processed by the pipeline are sequentially prefetched and stored therein.
The prefetched instructions are extracted or output using the selector 18 and sent to the hardware pipeline indicated in stages P1 to P6. The operation code area (OP area) in the instruction is shifted in registers 28, 29,... 32 sequentially with each advancement to a new stage. The address of the operand corresponding to the instruction is calculated by an adder 4, the corresponding operand is read from the cache memory 20 and then supplied to a calculation or arithmetic circuit 9. The calculated result is loaded into one of the general purpose registers 22 through a result register 10. During this period, when the instruction sent to the pipeline is a branch instruction, a branching determination circuit 33 shown in the FIG. 6 determines whether branching should occur.
The reference numerals 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 in FIGS. 2 to 4 correspond to the same units having the same numerals shown in FIG. 6.
In the prior art, if a request to execute a branching instruction using the functions of each stage as described above occurs in the first flow, as shown in FIG. 2, the content of general purpose register indicated by R1 of the instruction is loaded into the base register (hereinafter referred to as BR) 1 while content of general purpose register indicated by R3 is loaded into the index register (hereinafter referred to as XR) 2 in the stage P2 of the first flow of the branching instruction. An addition of the first operand and an increment is carried out using a 3-input adder 4 (where an address generating circuit comprises BR 1, XR 1, DR 3 described later and 3-input adder 4). The result is loaded into the operand register (hereinafter referred to as 2R) 8 in stage P5 via the P3 cycle operand address register (hereinafter referred to as P30AR) 5 in stage P4 cycle operand address register (hereinafter referred to as P40AR) 6 in the stage P4. Simultaneously, a comparison number from the general purpose register indicated by R3+1 or R3 is loaded into the operand register 7 (hereinafter referred to as 1R), a comparison is performed in the adder 9. As a result, a signal indicating the relationship between the values 1R and 2R (namely, a signal indicating 1R=2R, 1R&gt;2R, 1R&lt;2R) is output and simultaneously the value of 2R is loaded into a result register (RR) 10. The branching operation is determined based on the signal which indicates the relationship between 1R and 2R. In the system of the prior art, the comparison operation is carried out in stage P5.
Next, in the stage P2 of the second pipeline flow, a value in general purpose register indicated by B2 of the instruction is loaded into BR 1, a value of D2 is loaded into a displacement register (hereinafter referred to as DR) 3, the branching address is generated by the 3-input adder 4 used for the first flow, and the address is then loaded into the instruction address register (hereinafter referred to as IAR) 11 in the stage P3 and simultaneously a branching address instruction read request is issued as soon as the branch-to address is generated in stage P2. Thereby, the instruction extraction pipeline [I1, I2, I3, P4] is loaded and in this case the branch-to address instruction is loaded.
The branching address instruction extraction pipeline allows processing to sequentially advance to stages Il, I2 and I3 and the branch-to instruction is output to the stage P1. When branching is successful, the branch-to enters the instruction execution pipeline.
Writing of a value where an increment is added to the index value of the first operand of the general purpose register R1 to update the index is carried out using the general purpose registers indicated by RR10 to R1 at the end of stage P6 of the second flow.
As is apparent from FIG. 2, in the prior art system, the prefetch of a branch-to instruction is carried out for the instructions, when branching is not successful, that follow said branch. When the third flow of the not-successful branch-to instructions changes stages from the stages P1 to P2, the branching address instruction can be fetched in the stage P1. However, the instructions for as many as three flows must be aborted until the successful branching occurs. When the branching is not successful the third time when the branching conditions are determined, the fourth flow is brought into the stage P1, and if the branch is successful at this point, processing for instructions in as many three flows which have already aborted are cancelled and the branching address branch-to instruction enters the stage P1.
As is obvious from the above explanation, when the BXH instruction and BXLE instruction are executed in the prior art system for this example, five cycles are required for successful branching, while two cycles for unsuccessful branching. There is a large difference in the number of cycles and the processing for the three flows must be cancelled when the branching has been conducted successfully.
The differences in number of cycles for successful and unsuccessful branching complicates the processing of successive instructions, that is cancellation of many flows is complicated and requires time consuming processing for instruction cancellation (for example, requests for cancellation in the buffer memory and main memory, etc.). The disadvantages mentioned above result in large in efficiency problems during practical use.