This invention relates to digital computers and digital data processors and particularly to digital computers and data processors capable of processing two or more instructions in parallel.
The performance of traditional computers which execute instructions singly in a sequential manner has improved significantly in the past largely due to improvements in circuit technology. Machines which execute instructions one at a time are sometimes referred to as "scalar" computers or processors. As circuit technology is pushed to its limits, computer designers have had to investigate other means to obtain significant performance improvements.
Recently, so-called "superscalar" computers have been proposed which attempt to increase performance by selectively executing more than one instruction at a time from a single instruction stream. Superscalar machines typically decide at instruction execution time if a given number of instructions may be executed in parallel. Such a decision is based on the operation codes (OP codes) of the instructions and on data dependencies which may exist between adjacent instructions. The OP codes determine the particular hardware components each of the instructions will utilize and, in general, it is not possible for two or more instructions to utilize the same hardware component at the same time nor to execute one of the instructions if it depends on the results of another of the instructions (a "data dependency" or "data interlock"). These hardware and data dependencies prevent the parallel execution of some instruction combinations. In these cases, instructions are instead executed by themselves in a non-parallel manner. This, of course, reduces the performance of a superscalar machine.
Superscalar computers provide some improvement in performance but also have disadvantages which it would be desirable to minimize. For example, deciding at instruction execution time which instructions can be executed in parallel takes a significant amount of time which cannot be very readily masked by overlapping the decision with other normal machine operations. This disadvantage becomes more pronounced as the complexity of the instruction set architecture increases. Another disadvantage is that the decision making must be repeated if-the same instructions are to be executed a second or further time.
The cross-referenced applications all concern a digital computer or data processor called a scalable compound instruction set machine (SCISM) in which the performance of the parallel execution decision is made prior to execution time. In SCISM architecture, the decision to execute in parallel is made early in the overall instruction handling process. For example, the decision can be made ahead of the instruction buffer in those machines which have instruction buffers or instruction stacks or ahead of the instruction cache in those machines which flow the instructions through a cache unit.
Because the decision to execute in parallel is made prior to a point where instructions are stored, the results of the decision making can be preserved with the instructions and reused in the event that the same instructions are used a second or further time.
Preferably, the recording of the parallel execution decision making is in the form of tags which accompany the individual instructions in an instruction stream. These tags tell whether the instructions can be executed in parallel or whether they need to be executed one at a time. This instruction tagging process is sometimes referred to herein as "compounding". It serves, in effect, to combine at least two individual instructions into a single compound instruction for parallel processing purposes.
The exemplary embodiment of a SCISM is underpinned by the architecture and instructions of the System/370 product family available from the IBM Corporation, Armonk, N.Y., the assignee of this application. Preferably, the SCISM compounds instructions while they are in object form. As is known, System/370 architecture typically employs microcoded instructions to implement and control the execution of object-level instructions. Consequently, all System/370 instructions which are executed, either singly or in parallel in a SCISM, are controlled by one or more microinstructions. Microinstruction execution of object instructions is a widely used concept for which many implementations are known. The challenge in executing scalar instructions in parallel is to provide microinstruction sequences which reflect such parallelism.
One approach known to the inventor is implemented in a machine which executes up to two instructions in parallel. This approach provides a unique microcoded routine for all possible pairs of instructions, as well as routines for each instruction individually. While conceptually simple, this approach requires significant additional microcode storage to support the parallel instruction routines. Each pair of instructions which can be executed in parallel becomes, in effect, a new instruction with its own microcode. The storage and management overhead for such an approach can become substantial, adding many unique microcode routines to a standard set of microinstructions. Furthermore, the number of combinations proliferates geometrically with the number of instructions which are executed in parallel.
Consequently, there is a need in computers or processors which can execute two or more instructions simultaneously to provide machine-level instructions for all possible combinations without adding substantially to the overhead required for storing and retrieving the microinstructions.