This invention relates to digital computers and digital data processors, and particularly to digital computers and data processors capable of executing two or more instructions simultaneously.
Traditional computers which receive a sequence of instructions and execute the sequence one instruction at a time are known. The instructions executed by these computers operate on single-valued objects, hence the name "scalar" for these computers.
The operational speed of traditional scalar computers has been pushed to its limits by advances in circuit technology, computer mechanisms, and computer architecture. However, with each new generation of competing machines, new acceleration mechanisms must be discovered for traditional scalar machines.
A recent mechanism for accelerating the computational speed of uniprocessors is found in reduced instruction set architecture that employs a limited set of very simple instructions. Another acceleration mechanism is complex instruction set architecture which is based on a minimal set of complex multi-operand instructions. Application of either of these approaches to an existing scalar computer would require fundamental alteration of the instruction set and architecture of the machine. Such a far-reaching transformation is fraught with expense, down-time, and initial reduction in the machine's reliability and availability.
In the co-pending patent applications, a scalable compound instruction set machine (SCISM) architecture is described in which instruction level parallelism is achieved by statically analyzing a sequence of scalar instructions one at a time prior to instruction execution in order to generate compound instructions formed by adjacent grouping of existing instructions in the sequence which are capable of parallel execution. Relatedly, when used herein, the term "compounding" refers to the grouping of instructions contained in a sequence of instructions, the grouping being for the purpose of concurrent or parallel execution of the grouped instructions. At a minimum, compounding is satisfied by "pairing" of two instructions for simultaneous instruction. Preferably, compounded instructions are unaltered from the forms they have when presented for scalar execution.
Parallel or simultaneous execution of scalar instructions poses certain hazards which must be accommodated in a SCISM machine. Such hazards are also called "interlocks". More particularly, a data dependency hazard, also called a "write-read hazard" or "write-read interlock" exists when two instructions of a serial sequence of instructions are executed simultaneously or in parallel. In this regard, the hazard arises when the second instruction must read the result of the first instruction in order to execute. See, for example, FIG. 1 where a first instruction 10 precedes a second instruction 12 in an instruction sequence. Both instructions require access to a set of general purpose registers (GPR) 14 where operands for instruction execution are stored. The first instruction 10 requires two operands which are stored in register locations 15 and 16, respectively. Assume that instruction execution requires that the result be written back into the GPR at register location 16. The second instruction 12 also requires two operands for execution, the operands stored at register locations 17 and 16. In order for the second instruction 12 to produce reliable results, its execution must be delayed until the results of executing the first instruction 10 have been written to register location 16.
A mechanism, illustrated in FIG. 2, has been proposed for collapsing the data dependency illustrated in FIG. 1. FIG. 2 shows that two instructions, for example, instructions 10 and 12 in FIG. 1, may be compounded into a unit that is to be considered as a single execution unit. If the first and second instruction 10 and 12 were compounded, the compounding would result in their being issued and executed in parallel. In the structure of FIG. 2, the compounded instructions are executed simultaneously, with the first instruction being executed by a two-to-one ALU 19, and the second instruction by a three-to-one ALU 21. The ALU 21 is designed to collapse write-read interlocks that might occur between the two instructions. In the example of FIG. 1, the ALUs 19 and 21 execute the instructions 10 and 12 in parallel with the ALU 19 operating on the operands in register locations 15 and 16, and the ALU 21 operating on the operands in register locations 15, 16, and 17. The operation of the ALU 21 implicitly combines the operands in register locations 15 and 16 as required by the first instruction to obtain a result necessary for execution of the second instruction.
Since the ALU 21 is designed to collapse write-read interlocks that might occur between two instructions being executed in parallel, the ALU is designed to execute the functions that arise for all instruction sequences whose interlocks must be collapsed. The operation of the ALU 21 is not the subject of this patent application, but is explained in detail in co-pending U.S. patent application 07/504,910.
The interlock collapsing hardware of FIG. 2 computes correct results for a compounded instruction when the individual instructions both specify valid ALU operations and contain a write-read hazard. However, such an apparatus will produce erroneous results when two invalid operations are specified. This constitutes a major difficulty which must be resolved in order to achieve compliance between the SCISM architecture and the architecture of the scalar machine which executes instructions sequentially.
The ALU 19 generates results for the execution of a first instruction, performing this execution as a "normal" two-to-one ALU operation. The determination of condition codes, CCs, and detection of overflow, OF, from the operation can be obtained using existing techniques.
Since the ALU 19 will set CCs and OFs resulting from the execution of the first instruction, the ALU 21 should limit its determination of these conditions only as if executing the second instruction by itself. In particular, the detection of overflow, OF, in the ALU 21 should be performed in parallel with a computation of its result and made as if the result of the ALU 19 was available and only the second instruction was being executed. To make this possible, certain information relating to the execution of the second instruction alone must be ascertained during the execution of this instruction by the ALU 21.
Therefore, there is a significant need for OF detection in a data dependency collapsing hardware apparatus such as in an apparatus which executes a pair of compounded instructions simultaneously. The detection of overflow must be specific to the execution of only the second instruction, yet must be gleaned from the execution of the three-to-one operation in the dependency collapsing hardware apparatus.