This invention relates to arithmetic logic unit (“ALU”) circuitry.
ALU circuitry is used in microprocessors. Microprocessors process instructions in several stages. Typically microprocessors fetch—i.e., retrieve—an instruction, decode the instruction, read the operands upon which the instruction will be performed, execute an operation on the operands, and writeback the results of the operation to a suitable output such as a Random Access Memory via a Register Ram Write Port, a register bank or any other suitable location. The ALU circuitry typically forms a portion of, and is used in, the execute state of the microprocessor.
Generally, microprocessors are “pipelined.” “Pipelining” refers to the fact that each of the processes of the microprocessor's stages may be occurring substantially simultaneously on different instructions. Thus, as the current instruction is being executed, a second instruction in the pipeline is being decoded and a third is being fetched from program memory.
Pipelined processors often have a delay between reading registers (in the Read stage) and writing registers (in the Writeback stage). This delay may be substantially overcome with respect to the processing steps of the microprocessor by “forwarding” the results of the execute stage for further use by the ALU before, or simultaneously to, the results are written to the Writeback registers. Forwarding ensures that the result of the previous instruction can be used by the next instruction. In one type of microprocessor, forwarding multiplexors may be implemented to make the forwarded result available to the microprocessor if needed.
FIG. 1 shows a conventional ALU 100. ALU 100 typically includes registers 110 and 120 which typically provide operands A & B. ALU may perform any one, or more, suitable calculations on the operands. Operators that perform these operations are depicted as ALU sub-units 130. The results obtained from these ALU sub-units may be fed into multiplexor (MUX) 140. Thereafter, the selected result of MUX 140 may be registered as the result in register 150. This result may then be transmitted to the Register RAM Write Port and/or may forwarded to Fwd A MUX 160 and Fwd B MUX 170 for use by subsequent instructions or as subsequent instructions.
One drawback of the circuit in FIG. 1 is that ALU units are each formed as separate individual units and, therefore, require substantial die space additional routing resources—i.e., interconnect, wiring, etc.—and individualized logic. Furthermore, processing by the individual units and by MUX 140 may take a relatively long time to propagate and incur excess routing delays.
Routing delays provide significant sources of the signal propagation delays found in Programmable Logic Devices (PLD). Therefore, reducing routing delays would add great benefit to an ALU implemented in a PLD. Furthermore, it would also be beneficial with respect to PLDs that are formed primarily from four-input Look-Up-Tables (LUT) if the ALU could be implemented substantially using four-input LUTs.
Therefore, it would be desirable to provide ALU circuitry formed from a unified circuit that provides the functionality of multiple ALU units but minimizes the resources required by the ALU.
It would be further desirable to provide ALU circuitry that performs the various functions of ALU circuitry in a shorter time than conventional ALU circuitry.
It would also be desirable to provide ALU circuitry that is configured to provide substantial advantage when implemented in a four-input LUT-based PLD.