Conventional architectures are scalar, represented by such systems as RISC, IBM System/360 and System/370. In addition there are such devices as have been described in Wulf et al., U.S. Pat. No. 4,819,155 and Oota, U.S. Pat. No. 4,852,040. See also, the article by W. A. Wulf proposed in Computer Architecture News, Mar., 1988, entitled "The WM Computer Architecture". The Wulf apparatus is for vector processing rather than scalar processing, but teaches two operands are combined in an ALU to produce a result in a first execution cycle, following which the result and a third operand are provide to a second ALU which produces a result in a second execution cycle. This reference hints at pipelining similar to superscalar machines which are known, as one way to improve performance.
Pipelining is a standard technique used by computer designers to improve the performance of computer systems. In pipelining an instruction is partitioned into several steps or stages for which unique hardware is allocated to implement the function assigned to that stage. If the cycle time of an n-stage pipeline implementation is assumed to be m/n, where m is the cycle time of the corresponding implementation not employing pipelining then the best pipeline implementation will have a cycle time of m/n. Another known technique is super-scaler, which permits instructions, grouped strictly on a first-in-first-out basis to be simultaneously issued. The superscaler machine was not designed for a scalable compound instruction set, where related instructions not necessarily originally written together, may be issued as a plural set unit instruction for execution in parallel.
The invention does not consider the parallel execution of instructions per se as novel, even though parallel execution of base instructions is achieved by the inventions, rather it concerns the execution in parallel or interlocked instructions. The System/370 sold by International Business Machines which can be made to execute in parallel certain interlocked instructions, and can perform with limitations the requirements of scalable compound instruction set machine as first disclosed in the reference applications, and there are such suggestions made in other applications as to possibilities which may be used, for example, U.S. Ser. No. 07/642,011 as other ALUs for a scalable compound instruction set machine. These existing processors have not been publicly used as such, and there has been no publication of the possibility of such a use, but the possibility has been described in some aspects in applications filed after the priority claimed herein.
Further, by way of background the first collapsing ALU was described in application Ser. No. 07/504,910, filed Apr. 4, 1990, entitled "Data Dependency Collapsing Hardware Apparatuses , the inventors being Stamatis Vassiliadis et al.; and in application Ser. No. 07,619,868, filed Nov. 28, 1990, entitled "Overflow Determination for Three-Operand ALUs in a Scalable Compound Instruction Set Machine", the inventors being Stamatis Vassiliadis et al., from which this application claims priority.
It is known to implement a three to one adder. It consists of a three to two carry save adder (CSA) followed by a two to one carry look ahead adder (CLA), as shown in FIG. 2. S. Vassiliadis and M. Putrino, recognized that the critical path in ALUs is usually limited by determination of result equal to zero. In "Condition code predictor for fixed-point arithmetic units," J. Electronics, vol. 66, no. 6, pp. 887-890, 1989, they proposed a method for predicting that the result is equal to zero for a two-to-one two's complement adder; however, as recognized by the author and one of the joint inventors here, that method does not apply for a three-to-one ALU.
A discussion of one known form of the two-to-one CLA can be found in S. Vassiliadis, "Recursive Equations for Hardware Binary Adders," Int. J. Electronics, vol. 67, no. 2, pp. 201-213, 1989, which discusses hardwired binary adders. This journal article may be referenced for definitions of the known quantities G.sub.n.sup.x and T.sub.n, which represent the pseudo-generate and transmit, respectively, at bit position n in the Boolean expressions which we use to describe the stages of the CLA employed in a described preferred embodiment of our inventions. For ease in understanding of our inventions, they have been precisely detailed in Boolean expressions and the booksets described in the description of our preferred embodiments. In the discussion which follow, only the generation of true logic values of a variable are presented in stage by stage delay. These assumptions, however, are not intended to an do not limit the applicability of the discussion and the devices presented since such a bookset is common in currently available technologies and extendable to other technologies having similar characteristics or equivalent functional power within their bookset.
The SCISM architecture is applicable not only to 370 architectures, but other architectures, including RISC, where it is desirable to enhance performance of applications which have been developed and which would desirably operate faster if there were parallel issuance and execution of specific plural instructions for an ALU. Such a system enables new hardware to execute old instructions at a more rapid rate, reducing the necessity of reprogramming old programs for a new machine having a new architecture.