1. Field of the Invention
This invention relates to the field of microprocessors and more specifically to an improved method and circuit for generating condition codes in an arithmetic logic unit of a microprocessor.
2. Description of the Relevant Art
Microprocessors determine the speed and power of personal computers, and a growing number of more powerful machines, by handling most of the data processing in the machine. Microprocessors typically include at least three functional groups: the input output unit (I/O unit), the control unit, and the arithmetic-logic unit (ALU). The I/O unit interfaces between external circuitry and the ALU and the control unit. I/O units frequently include signal buffers for increasing the current capacity of a signal before the signal is sent to external components. The control unit controls the operation of the microprocessor by fetching instructions from the I/O unit and translating the instructions into a form that can be understood by the ALU. In addition, the control unit keeps track of which step of the program is being executed. The ALU handles the mathematical computations and logical operations that are performed by the microprocessor. The ALU executes the decoded instructions received from the control unit to modify data contained in registers within the microprocessor.
The computing power of a microprocessor can be increased by increasing the clock speed. Higher clock speeds, however, place design constraints on the circuit boards and interconnect circuits that interact with the microprocessor. Alternatively, increased computing power can be accomplished by maximizing the number of instructions that can be executed at a given speed and by minimizing the number of cycles during which the microprocessor is in a wait state. A wait state refers to a microprocessor cycle during which the microprocessor executes no instructions. Wait states can be required, for example, when the microprocessor must wait for data from an external device.
One source of undesired delay or wait states involves the generation of condition codes by the ALU. In a typical microprocessor, the ALU generates one or more condition codes each time it performs a mathematical or logical operation upon a set of operands. The condition codes indicate characteristics or attributes of the result produced by the mathematical or logical operation executed by the ALU. Condition codes are used to control branching, to monitor the accuracy of data, and to control a variety of other functions. One widely known example of a condition code is referred to as the zero flag. The zero flag is a special-purpose one bit register that is set when a mathematical or logical operation executed by the ALU produces a result equal to zero.
Many mathematical operations performed by the ALU proceed from the least significant bits of the operands to the most bits. For example, a typical ADD operation executed by a ripple-carry adder will add the least significant bit of a first operand to the least significant bit of a second operand to produce a least significant bit of result and a first carry bit. The first carry bit will then be added to the next most significant bit of the first operand and the next most significant bit of the second operand to produce an intermediate result bit and an intermediate carry bit. This sequence continues until, eventually, the most significant bits of the operands have been added together with the carry bit from the preceding sequence to produce the most significant bit of the result (and a carry out bit).
Reference to FIG. 1 will help clarify this example. In FIG. 1, a first operand 12 is added to a second operand 14 to produce the result 16 with a ripple-carry adder. First operand 12, second operand 14, and result 16 include n bits. Result 16 is produced by adding a least significant bit 12a of first operand 12 to a least significant bit 14a of second operand 14 to produce a least significant bit 16a of result 16 and a first carry bit 18. Next, first carry bit 18, intermediate significant bit 12b of first operand 12, and intermediate significant bit 14b of second operand 14 are added to produce intermediate significant result bit 16b and second carry bit 20. This procedure is repeated until finally, n.sup.th -1 carry bit 24 is added to most significant bit 12n of first operand 12 and most significant bit 14n of second operand 14 to produce most significant bit 16n of result 16. As will be appreciated, least significant bit 16a of result 16 is generated prior to intermediate significant result bit 16b which, in turn is generated prior to most significant result bit 16n. Stated similarly, a finite but measurable period of time elapses between generation of least significant result bit 16a and most significant result bit 16n.
FIG. 2 shows a block diagram of an ALU execution unit 26 and an ALU condition code generation circuit 28. Execution unit 26 receives first operand 12 and second operand 14 together with instruction and control signals routed to the execution unit 26 over control bus 25. In response to the control signals and instruction signals contained on bus 25, execution unit 26 performs operations on first operand 12 and second operand 14 to produce result 16 as an output. The operations that execution unit 26 may perform on first operand 12 and second operand 14 include numerous instructions, such as the simple add instruction shown in FIG. 1. These operations frequently operate upon low order bit "a" of the respective operands 12 and 14 before the instruction operates on high order bit "n". As will be appreciated by those skilled in the art, the delay time between the generation of least significant result bit 16a and most significant result bit 16n in a conventional ripple-carry adder increases as the number of bits within first operand 12 and second operand 14 increases.
To address the delay time associated with ripple-carry adders, execution unit 26 can be designed with a fast adder circuit, such as a carry lookahead adder (CLA). The CLA reduces the delay time required to produce result 16 by reducing the number of logic levels needed to generate the most significant carry bit. Whereas the ripple-carry adder requires approximately 2N levels of logic to generate result 16, a practical CLA requires only approximately log.sub.2 N logic levels where N is the number of bits. A description of ripple-carry adders and CLAs can be found in J. Hennessy & D. Patterson, Computer Architecture a Quantitative Approach, pp. A-2 to A-3 and A-31 to A-39 (Morgan Kaufmann 1990) hereinafter "Hennessy"!.
Regardless of the time required for execution unit 26 to produce result 16, conventional microprocessor-based systems do not begin to generate condition codes, including the zero flag, until result 16 has been calculated. Referring again to FIG. 2, condition code generation circuit 28 receives result 16 from execution unit 26 and produces a plurality of condition codes in response thereto. As stated previously, condition codes indicate characteristics of the result 16 after execution of the instruction by execution unit 26. The condition codes produced by execution unit 28 shown in FIG. 2 include those commonly found in x86 type microprocessors. These flags include the parity flag (PF), the zero flag (ZF), the carry flag (CF), the auxiliary carry flag (AF), the sign flag (SF), and the overflow flag (OF). The zero flag is generally set after each mathematical or logical operation whenever result 16 is equal to zero. The zero flag is frequently used to control the branching of computer programs. If, for example, it is desired to branch to a particular location or subroutine when first operand 12 is equal to second operand 14, the condition can be detected by subtracting first operand 12 from second operand 14 and checking the zero flag. If first operand 12 is equal to second operand 14, result 16 will be equal to zero and the zero flag will be set. Conditional branching based upon the value of the zero flag is well known in the art of microprocessor programming.
The generation of the zero flag is within the critical speed path of the program execution. Execution of subsequent program instructions is often dependent upon and, therefore, must await the determination of the zero flag. When program execution must wait for generation circuit 28 to generate a zero flag before subsequent execution can occur, the time required to generate the condition codes has a direct impact on the system performance. It would therefore be desirable to reduce or eliminate the delay between the generation of result 16 and the generation of the zero flag.