The invention relates to a digital data processor including a central processing unit (CPU).
Sorting pieces into a predetermined order is one of the basic processes executed in a digital data processor such as a digital computer. The sorting requires an operation of comparing the magnitudes to two pieces of data and ordering or classifying the two pieces of data based on the comparison.
As discussed in "CMOS Gate Array Implementation of the SPARC Architecture" by M. Namjoo et. al. (COMPCON, 1988 Proceedings, pp. 10 to 13), a digital data processor includes: an instructions; register for reading instruction; a plurality of general registers for storing data; an operation means (ALU, or arithmetic and logic unit) for, not only performing operations on the data supplied from the general registers, but also for writing the operated data back thereto; and a central processing unit (CPU) having a set of instructions that allow a basic instruction to be executed within a single machine cycle. A digital data processor may further include a main storage unit and a bus for interfacing between the main storage unit and the CPU.
To sort two pieces of data by magnitude with the digital data processor thus constructed, a series of instructions such as shown in FIG. 4 are required. Steps 1 to 6 in FIG. 4 correspond to a single instruction executed by the CPU. Each step will be described.
(i) In step 1 the CPU executes a "comparison instruction" that performs a series of operations of reading the data of the general register A and the data of the general register B, causing the ALU to compare these data, and updating a condition code. In this step, if the data of the register B is equal to or larger than the data of the register A, the sign bit of a condition code is set to 1, while if the data of the register A is equal to or larger than the data of the register B, the sign bit is set to O.
(ii) In step 2 a "conditional branch instruction" is executed. The conditional branch instruction selects step 3 or 5 as a step to be executed next based on the sign bit of the conditional code which is either 1 or 0. If the sign bit is 1, i.e., the data of the register B is larger than the data of the register A, the CPU executes step 4 after step 3. Although a bit other than the sign bit may be used, the bit used for conditional judgment indicates that the data of the register A is larger than the data of the register B or vice versa.
(iii) In steps 3 and 4 a "move instruction" is executed. The move instruction moves both data of the registers A and B to the registers C and D, respectively. If the sign bit is 0, i.e., the data of the register B is equal to or smaller than the data of the register A, the CPU executes step 6 after step 5. In steps 5 and 6, a move instruction is executed to move the data of the registers A and B to the registers D and C, respectively. Accordingly, in either case, the data which is larger between the registers A and B is stored in the register D and the data which is smaller between the registers A and B is stored in the register C.
However, in sorting two pieces of data by magnitude following the above steps, the processing time is relatively long, which is a disadvantage.
More specifically, the processing time corresponds to a time for the CPU to execute four instructions. A conditional branch instruction is included in the four instructions. The conditional branch instruction requires a longer processing time than a single ordinary instruction in the processing of the digital data processor that executes one instruction within one machine cycle, thereby entailing a longer time than for executing four ordinary instructions. The reason is that when pipeline processing is applied to increase the operation speed, the execution of the conditional branch instruction causes pipeline disturbance by temporarily emptying a pipeline.
The pipeline disturbance will be described in more detail. FIG. 5 is a diagram showing a timing of pipeline processing performed by a digital computer. In FIG. 5, execution of a single instruction requires four cycles: a fetching cycle; a decoding cycle; an operation cycle; and an operation result writing cycle. However, instructions are executed with a delay of one machine cycle each, thereby allowing one instruction to be executed substantially within each machine cycle.
However, a branching address to be specified by the conditional branch instruction for loading a next instruction varies depending on the operation result of a last instruction. If an instruction 2 is a conditional branch instruction in FIG. 5, then a next instruction cannot be fetched before the completion of cycle 3 during which the operation cycle of instruction 1 is executed after the fetching cycle of instruction 2 has been completed. In other words, it is in cycle 4 that the next instruction 3 is permitted to be fetched. Thus, no instruction can be executed during the cycles for executing instruction 3. This implies that the conditional branch instruction requires a two-cycle execution time according to the operation timing shown in FIG. 5. As a result, the conditional branch instruction is "penalized" for the processing time that is longer than required by execution of a single ordinary instruction.
To eliminate the pipeline disturbance, a delayed conditional branch instruction has been discussed in "Reduced Instruction Set Computer" by D. A. Patterson (Communications of the ACM, Vol. 28, No. 1, 1985, pp. 8 to 21). However, this technique is not a solution to the time-consuming processing in that it disadvantageously requires an additional jump instruction.
Further, a processor disclosed in Japanese Patent Unexamined Publication No. 221036/1987 (Japanese Patent Application No. 9220/1987) attempts to suppress the pipeline disturbance by predicting the branching destination in a conditional branch instruction and reading a series of instructions following the execution of the conditional branch instruction. However, presence of unpredictable conditional branch instructions and prediction errors have led to overhead and required additional hardware.
A reduced instruction set computer (RISC) proposed in Japanese Patent Unexamined Publication No. 49843/1988 (Japanese Patent Application No. 119167/1987) has an object of improving the processing speed by increasing the number of register file ports and allowing concurrent execution of a plurality of operation units. However, such an operation arrangement provides no particular advantage in the sorting of two pieces of data by magnitude, because its concurrently processable data are so independent of each other that comparison of the two pieces of data in one operation unit does not provide data to control the processing of another operation unit. Thus, concurrent sorting cannot be achieved by such an operation arrangement.