1. Field of the Invention
The present invention relates to the field of computer systems. More specifically, the present invention relates to the area of arithmetic processing devices.
2. Description of the Related Art
Conventional high performance execution units as commonly used in computer systems often contain at least one floating-point unit (xe2x80x9cFPUxe2x80x9d) and one integer unit (xe2x80x9cIUxe2x80x9d) for performing arithmetic calculations. A typical execution unit, which can be either CSIC or RISC architectures, contains two separate data paths to handle floating-point data in one data path and integer data in another data path. Traditionally, FPU handles floating-point data path, while IU controls integer data path.
The IU commonly also handles fixed-point data. A fixed-point data format contains fraction portion. However, the fixed-point data format does not have no exponent portion. Thus, the fixed-point data format can be considered a branch of integer data format.
FPU typically contains a circuit of floating-point multiply-add (xe2x80x9cFP Maddxe2x80x9d) for performing a function of floating-point multiplication followed by an addition. Similarly, IU contains a circuit of integer multiply-accumulate (xe2x80x9cInt Maccxe2x80x9d) for performing a function of integer multiplication followed by an accumulation.
FIG. 1 illustrates a conventional computer system 100, which includes a processing unit 101, a system bus 108, and a memory unit 103. The memory unit 103 further includes system main memory 102, read-only memory (xe2x80x9cROMxe2x80x9d) 104, and storage device 106. The processing unit 101 typically contains an execution unit 116, an instruction decoder 112, a cache memory 110, and a register file 114. The execution unit 116 usually includes a floating-point unit 120 and an integer execution unit 140 where FPU 120 further contains a FP Madd circuit and IU 140 contains an Int Macc circuit. The FP Madd circuit is used to perform floating-point multiplication and additions, while the circuit of Int Macc performs integer multiplication and accumulations.
FIG. 2 illustrates a conventional pipeline design FPU 200. FPU 200 contains a FP Madd circuit 201, a set of working registers 208, a selector 206, a register file 202, and a memory device 204. The register file 202 and the memory device 204 are used to store floating-point data and the working registers 208 are used to store operands, which will be used for the next arithmetic calculations. The selector 206 is generally used to select operands to be stored in the working register 208 from either the register file 202 or the memory device 204.
The FP Madd circuit 201 typically contains a multiply array 210, a first adder 212, a shifter 214, a second adder 216, and a result register 218. The multiply array 210 performs a floating-point multiplication between a first and second operand. The output of multiply array 210 commonly contains carry and sum portions. After the multiplication, either adder 212 or adder 216 performs a floating-point addition between a third operand and the result of the multiplication. The shifter 214 may be used to perform an operand alignment or a normalization.
An operand alignment typically takes place before the FP addition where the multiply result and the third operand are aligned so that the operands can be properly added. An operation of normalization is typically performed after the FP addition where the most significant bit (MSB) of the result from the addition needs to be shifted to the MSB of the mantissa. It should be noted that the operations between alignment and normalization are typically mutual exclusive.
Referring back to FIG. 2, a FP multiply is performed in a multiply array 210. If an operand alignment is required, the adder 212 is bypassed. The operand alignment is then performed in the shifter 214 and the FP addition is subsequently performed in the adder 216. Likewise, if a normalization is required, the FP addition is performed in the adder 212. The normalization is then performed in the shifter 214 and the adder 216 is subsequently bypassed. If no operand alignment and normalization are required, the FP addition can be performed in either the adder 212 or the adder 216.
Moreover, a FP multiply accumulation is typically an arithmetic operation where the result of the first FP Madd is used as the third operand for the second FP Madd. For example, the data stored in the result register 218 is bypassed to working register C 208 as a third operand for the next FP Madd operation. It should be noted that other circuits, such as a rounding circuit, an exponent circuit, or an adjustment circuit for minor shifts, such as 1 or 2 bit adjustment, may be included in the FP Madd circuit 201.
FIG. 3 illustrates a conventional IU 300 within a pipeline design. The IU 300 typically contains an Int Macc circuit 301, a set of working registers 308, a selector 306, a register file 302 and a memory unit 304. The register file 302 and the memory device 304 are used to store integer data and the working registers 308 are used to store integer operands, which will be used for the next integer arithmetic calculations. The selector 306 is used to select operands to be stored in the working register 308 from either the register file 302 or the memory device 304. The Int Macc circuit 301 further contains a multiply array 310, an accumulator 312, and a result register 318. The multiply array 310 performs an integer multiplication, while the accumulator 312 performs an integer accumulation.
Referring back to FIG. 1, the execution unit 116 contains at least one FP Madd circuit and one Int Macc circuit. FP Madd and Int Macc circuits both contain a multiply array and adder circuits, and both are capable of performing multiplication followed by summation where a summation can be either an addition or an accumulation. Moreover, a layout of a multiply array or adder circuit traditionally requires a large portion of silicon area within a chip. For example, a type 64-bit multiply array circuit could take 10 percent of silicon area of a chip to manufacture. Duplicated multiply arrays and adder circuits within FP Madd circuit and Int Macc circuit not only costs silicon area of a chip, but also slows down the overall performance. Therefore, it is desirable to have a multiply-add that is capable of handling both floating-point and integer data. As will be seen, one embodiment of the present invention provides a multiply-add device that is capable of performing both floating-point and integer multiply-add functions using one set of multiply array and adder circuits.
The present invention provides a device used in computer systems for performing floating-point multiply-add and integer multiply-accumulate operations.
In one embodiment, the device comprises a multiplier and at least one adder for performing a floating-point multiplication followed by an addition when operands are in the floating-point data format. The device is also configured to perform an integer multiplication followed by an accumulation when operands are in the integer data format. The device is further configured to perform a floating-point multiply-add or an integer multiply-accumulate in response to control signals.
In another embodiment, the device comprises a multiply array and at least two adders. The multiply array and a first adder are used to perform a floating-point multiplication followed by an addition when operands are in floating-point data format, while the multiply array and a second adder is used to perform an integer multiplication followed by an accumulation when operands are in the integer data format. The device is further configured to perform a floating-point multiply-add or an integer multiply-accumulate in response to control signals.
In another embodiment, the device contains an adder and the adder is capable of performing a floating-point addition and an integer accumulation. The adder is further configured to be extra wide to reduce operand misalignment. Moreover, the device stalls the process in response to the condition of operand misalignment.