The present invention relates to computers and, more particularly, to arithmetic logic units for computers. A major objective of the present invention is to provide for faster zero-sum determinations.
Much of modern progress is associated with advances in computer performance. Recent computers typically use one or more microprocessors to execute desired operations. Each microprocessor design is characterized by the set of instructions it can recognize and execute. The instruction sets of early microprocessors included a relatively small number of simple instructions. Accordingly, many instructions could be required to implement even simple operations such as addition and multiplication. Succeeding generations of microprocessors accommodated more instructions and more complex instructions, thus reducing program length as well as programming time.
To provide for synchronous operation, instructions progress according to fixed-period instruction cycles. Simple instructions can be performed in a single instruction cycle, while more complex instructions may require multiple instruction cycles. Most instructions can be completed before the end of a cycle; the remainder of the cycle is, in a sense, wasted.
This wasted cycle time can be minimized at the microprocessor design stage by selecting a short instruction cycle. However, a shorter instruction cycle increases the number of instructions that must be performed in multiple cycles. There is overhead involved in managing multi-cycle instructions. This overhead, in addition to that associated generally with larger instruction sets, results in increased microprocessor complexity and size. The weight of industry opinion is that these increases in size and complexity more than offset the advantages of adding more multi-cycle instructions to the instructions sets of microprocessors.
Increasingly, processors are designed as "reduced instruction-set computers" (RISC). In the RISC approach, a relatively small set of, preferably single-cycle, instructions is used. This approach takes better advantage of integrated circuit real estate and generally improves processor throughput. A disadvantage is the number of instructions required to implement an operation is increased. However, compilers have been developed that can generate suitable instructions from a high-level programming language. This relieves the programmer of the burden of generating the long program code required by the small instruction set.
Preferably, all or most instructions are executed within a single instruction cycle. This minimizes the circuitry required to manage instructions of varying length. A disadvantage is that the instruction cycle must be matched to the longest single-cycle instructions. Instructions that could be executed in less time still consume an entire cycle. Overall processor throughput is thus closely tied to time required to perform the longest single-cycle instruction.
In some cases, a microprocessor architect can choose between executing an operation using a single instruction to save cycles or in multiple-instructions so that a shorter instruction cycle can be used. An important example is a multiplication coupled with a zero detection of the product. There are many cases where a program branches in the event of a zero detection. For example, where a product of a multiplication is to be used as a divisor in a subsequent division, e.g., a=b/(c.multidot.d), zero detection of the product can be used to avoid a division by zero.
Because of the frequency of its use, multiplication plus zero detection can define a useful single instruction. Multiplication is a relatively long instruction, but zero detection can be achieved in a relatively short time. For example, the bits of a number can be NORed together so that a high output indicates a zero product while a low output indicates a non-zero product. Even though the additional time required for the zero detection is short, it can have a large impact on throughput if the instruction cycle is lengthened to permit its execution within a single multiplication cycle. In that case, the time required for zero detection is added to all instructions whether or not they involve a zero detection.
The alternative is to perform the multiplication and the zero detection as separate instructions. However, this is wasteful because entire cycles must be devoted to the zero detections, which should only consume a fraction of a cycle. What is needed is an approach that avoids the tradeoffs between the one cycle and the two cycle implementations of multiplication with zero detection.