The present invention relates to executions in a processor and more specifically to an arithmetic logic unit (ALU) that performs three-way operations in a single cycle to increase the efficiency of a processor.
Modern computer systems typically contain several integrated circuits (ICs), including a processor which may be used to process information in the computer system. The data processed by a processor may include computer instructions that are executed by the processor as well as data, which is manipulated by the processor using the computer instructions. The computer instructions and data are typically stored in a main memory in the computer system.
Processors typically run programs or processes by breaking them down into instructions and by executing the instructions in a series of small steps. In some cases, to increase the number of instructions being processed by the processor (and therefore increase the speed of the processor), the processor may be pipelined. Pipelining refers to providing separate stages in a processor where each stage performs one or more of the small steps necessary to execute an instruction, i.e., several instructions are overlapped in execution. In some cases, the pipeline (in addition to other circuitry) may be placed in a portion of the processor referred to as the processor core. Some processors may have multiple processor cores, and in some cases, each processor core may have multiple pipelines. Where a processor core has multiple pipelines, groups of instructions may be issued to the multiple pipelines in parallel and executed by each of the pipelines in parallel.
An arithmetic logic unit (“ALU”) is one of the fundamental building blocks of a processor. The ALU is a circuit that performs a set of arithmetic and logic operations. These operations can be performed on one or more operands, e.g., binary words, received by the ALU. The operands, binary words, or values are strings of zeros and ones that may be n-bit long. For example, the operands may be 8, 16, 32, or 64 bits to name just a few examples. The ALU may add or subtract one operand from another operand to obtain a result. The ALU may also execute multiplication and division operations. Arithmetic operations are performed by an arithmetic circuit in the ALU. Typically, the arithmetic circuit includes an adder, which may include, a number of full adder circuits configured in a cascade. Adder circuits are a plurality of logic gates and electronic components arranged to perform an arithmetic operation. A number of adder circuits have been developed by configuring a plurality of logic gates, including, for example, parallel prefix adders (Ladner-Fisher adder, Kogge-Stone adder, Brent-Kung adder, Han-Carlson adder), ripple carry adders, carry look-ahead adders, block-carry look ahead adders, conditional sum adders, carry select adders, carry skip adders, and carry save adders. The operations performed by the adder can be selected by controlling the inputs of the adder. These control signals or inputs can instruct the arithmetic circuit to perform a specified operation (e.g., addition, subtraction, increment or decrement). The ALU may also include a multiplier circuit for executing operations such as, for example, multiplication and division. Again, the control signals or inputs can instruct the arithmetic circuit to perform a specified operation (multiplication, division, etc.).
The ALU may also subject one or more operands to logic functions such as AND, OR, XOR (i.e., Exclusive OR), and NOT logic functions. Other logic functions such as NAND (i.e., not AND), NOR (i.e., not OR), and XNOR (i.e., exclusive not OR) can also be performed by the logic circuit. Logic functions and operations performed by the logic circuit in the ALU may be based upon one or more control inputs. These control signals or inputs may be used by and common to both the logic and arithmetic circuits.
In processors, the speed at which operations are performed by the ALU is usually limited by the arithmetic circuit. The speed of the ALU and the processor may be limited by the adder circuitry of the arithmetic circuit. One way to boost single thread execution performance of a processor core is called fusion. Fusion refers to the ability of the processor to fuse pairs of instructions and execute them together as if they would only be one instruction, thus doubling the instruction execution bandwidth, which in turn increases application performance. It may be beneficial to design an ALU that can fuse instructions in order to execute two instructions on three operands in a single cycle of the ALU. It may be beneficial if the design and architecture of the ALU did not require a significant increase in the area required on the semi-conductor chip by the logic circuits to implement such a three-way ALU (e.g., capable of performing arithmetic operations and logic functions on three operands in a single cycle), or result in an overall delay of the ALU lowering the maximum frequency at which the ALU can operate.