Matrix multiplication has numerous applications in applied mathematics, physics, engineering, and etc. Particularly, an important primitive of machine learning is matrix multiplication. In computing systems, matrix multiplication or a matrix product can be achieved by binary operations that produce a matrix from multiplying two matrices. In hardware, this can be accelerated by having hundreds or thousands or even more multiply accumulators. For example, the multiplication accumulation operation can be represented asA(1)×B(1)+A(2)×B(2)+ . . . A(N)×B(N).
Booth's algorithm is a prevalent computer arithmetic algorithm for multiplication. Conventionally, each single multiply of a pair of operands (e.g., A(1)×B(1)) is first computed, which include computing partial products (or denoted as “pp”) and then summing all these partial products to get a product of this pair. For example, using the Radix-8 Booth encoding, an 8-bit by 8-bit multiply has 3 partial products as the multiplier is reduced into 3 digits through encoding. Any negative partial products for a single multiply are typically I's complements and not sign extended. For example, the partial products are 10 bits each, namely pp0[9:0], pp1[12:3], pp2[15:6]. A 16-bit fixup vector (fixup[15:0]) is also computed for adding the “+1” needed to convert 1's complements to 2's complement as well as for correcting for the deficiency that the partial products were not sign extended. The product of this single multiply is then obtained by adding the fixup value to the partial products after shift and this has at least 16 bits in precision, represented aspp0[9:0]<<0+pp1[12:3]<<3+pp2[15:6]<<6+fixup[15:0].
The final product of the N pairs of multiplies is then obtained by summing all the individual products of all the pairs. The adders used for summation of all the products need to have at least the precision of the product, which is 16 bits in this example.