In computing, an arithmetic logic unit (ALU) is a digital circuit that performs arithmetic and logical operations, that is typically part of the central processing unit (CPU) of the computing system. The ALU of a computing system typically includes hardware, such as adders, shifters, and muxes for performing the arithmetic calculations on binary numbers.
Multiplying large binary numbers can be challenging to implement in hardware. Multiplication typically occurs by creating partial products for each binary digit that is multiplied by another binary number, and then adding all the partial products together, until the final product sum is obtained. The multiplication of binary digits is fairly simple because either the same binary number is copied, when the multiplier is 1 or a set of zeros is used, when the multiplier is 0. However, after all the partial products are obtained, adding up the partial products is the more challenging aspect of binary multiplication. Often times, to reduce the number of partial products created in an N by N binary multiplication, Booth Encoding is employed. Booth Encoding is a well known method used in some hardware implementations of multiplication. In implementing an N by N multiplication, without Booth encoding, one creates N partial products to be added together to find the product. However, using Booth encoding, the number of terms is approximately cut in half. If N is even, Booth encoding results in (N/2)+1 partial products to be added together. If N is odd, then (N+1)/2 partial products are obtained. When Booth Encoding is employed, the number of arithmetic hardware components used to perform the multiplication may be reduced.
A common case is when the multiply hardware is used for integer and floating point applications. The integer multiplies can be for 32 by 32 bits or for 64 by 64 bits. The most common floating point formats are single precision (24 by 24 bit mantissa multiplies) and double precision (53 by 53 bit mantissa multiplies). In order to handle all these cases in a single ALU, a 64 by 64 bit multiply needs to be provided in hardware. Thus, because N is even here (i.e., 64), 64/2+1 partial products, or 33 partial products are created using booth encoding.
In order to add up the partial products, first a carry save adder tree is used, often called a Wallace tree. This quickly combines the partial products until only the last two terms remain to be added. These two terms are then added with a carry look-ahead adder to obtain the result of the multiply. The carry save adder may use full adders (also called 3 to 2 compressors), 4 to 2 compressors, or 5 to 3 compressors. Often times, because 4 to 2 compressors are more efficient than 3 to 2 compressors, 4 to 2 compressors are preferred. In the case where there are only 32 partial products instead of 33, then the first level of execution using 4 to 2 compressors would compress 32 partial products to 16. The second level would compress the 16 to 8. The third level would compress the 8 to 4. Finally, the fourth and last level would compress the 4 to 2, and these two would be added together using, for example, a carry-look ahead adder to obtain the product result.
However, in this case, there still is the 33rd term that has not yet been accounted for. To solve this problem, a combination of 4 to 2 compressors are used along with 3 to 2 compressors. For example, the IBM power 6™ computer uses 4 to 2 for the first level, 3 to 2 for the second and third levels, and 4 to 2 for the fourth and fifth levels. Such an implementation uses five levels of hardware.
Today, for floating point operations, most computing systems provide functionality to perform both a multiply and an addition to the multiplied product. That is, the major floating point units today provide for the floating point multiply-add function, where (A*B)+C is computed. While the Booth encoding and carry save addition described above is implemented for the A*B part of the operation, the C input is shifted to align its binary point with the binary point of the product, A*B. When both the alignment and the carry save adder are finished, the two terms from the carry save adder and the aligned C term are combined with 3 to 2 compressors, the result of which then goes to the carry look-ahead adder to complete the multiply add operation.
For the floating point operations, the entire 64 by 64 array is not used since the largest inputs for double precision floating point operations is 53×53. The portion that is used may be placed anywhere within the larger array. In particular, it may be placed so that the 33rd term in the Booth encoding is known to be zero and therefore is not created or used in the carry save adder. In this case, only 32 terms need be provided for, and the carry save adder may contain only four levels of 4 to 2 compressors.