1. Field of the Invention
The invention relates to the field of computer systems. More specifically, the invention relates to software for performing multiplication operations.
2. Background information
Since numerous routines executed on processors require multiplication, processors typically are capable of executing an instruction to multiply together one or more operands. Unfortunately, certain routines require a higher-precision multiply (e.g., a 16 bit by 32 bit multiply) than is supported by the multiply instruction(s) of a processor. When no single multiply instruction can perform the higher-precision multiply, different combinations of instructions must be used.
Binary numbers are typically used either to represent unsigned numbers or signed numbers using 2""s complement signed representation. The multiplication operation is performed in a different fashion for signed and unsigned numbers.
The full result of a N bit by N bit multiplication requires 2*N bits to represent. A xe2x80x98full-multiplyxe2x80x99 instruction is one which multiplies two N bit numbers and yields the full 2*N bit result. Processors that lack such a multiply instruction use two instructions: 1) a xe2x80x98multiply-highxe2x80x99 instruction which produces the upper N bits of the result; and 2) a xe2x80x98multiply-lowxe2x80x99 instruction which produces the lower N bits. Additional instructions are used to combine the two halves if necessary.
The full result of a 2*N bit by N bit multiplication requires 3*N bits to represent, however in many cases not all the bits of the result are needed. A smaller range out of the 3*N bit result is often used, most typically 2*N bits. If the multiplicands and result represent integers, the lower 2*N bits will be used (in this case an error may occur if the result does not fall into the range which can be represented by 2*N bits). If the multiplicands and/or result represent fixed-point numbers, some other range of 2*N bits out of the 3*N bit result will be used, depending on the radix point locations. Table 1 illustrates the selection of 2*N bits out of a 3*N bit result. In Table 1, xe2x80x98Kxe2x80x99 and xe2x80x98Nxe2x88x92Kxe2x80x99 respectively represent the number of unused upper and lower bits, while the 2*N bits from the Nxe2x88x92K+1 bit to the 3*Nxe2x88x92K bit are used. K may range from 0 to N.
FIG. 1 is a data flow diagram illustrating a method of using two N bit by N bit multiply operations to perform an N bit by 2*N bit unsigned multiplication. Although the complete result of such a multiplication is 3*N bits long, typically only part of the 3*N bit result is used in further computations (e.g., 2*N bits are used out of the full 3*N bit result as illustrated in Table 1). In FIG. 1, rectangles are used to illustrate data and ovals are used to illustrate operations. FIG. 1 shows a value A represented in 2*N bits and a value B represented in N bits. The value A is divided into a most significant half (AHIGH) and a least significant half (ALOW).
In step 110, AHIGH is multiplied by B using unsigned multiplication to generate B*AHIGH. In step 120, ALOW is multiplied by B using unsigned multiplication to generate B*ALOW. Note that both steps 110 and 120 perform a full N bit multiplication which multiplies two N bit numbers to produce a 2*N bit result. On processors which do not have a full multiply but have multiply-high and multiply-low instructions, steps 110 and 120 will be performed by either: 1) using multiple instructions; or 2) using multiply-high/low instructions that perform 2*N bit by 2*N bit multiplications.
Aligning the least significant bit position of B*AHIGH with the (N+1) least significant bit position of B *ALOW and then performing an addition operation yields B*A. The shifting in steps 130 and 140 illustrate one way of properly aligning the values for the addition operation (step 150) that generates a value representing B*A. In step 130, B*AHIGH is logically shifted left K bits to generate K less than  less than (B*AHIGH). While in step 140, the B*ALOW is arithmetically shifted right (Nxe2x88x92K) bits to generate (B*ALOW) greater than  greater than (Nxe2x88x92K). It is worthwhile to note that the shifting of B*ALOW to the right allows for the use of a 2*N bit addition.
While FIG. 1 illustrates a method for performing unsigned multiplications, often times signed multiplications are required. However, the method of FIG. 1 cannot be used to perform signed multiplications
A method and apparatus for performing N bit by 2*N (or 2*Nxe2x88x921) bit signed multiplication using two N bit multiply instructions is described. According to one aspect of the invention, a method for performing signed multiplication of A times B (where B has N bits and A has N*2 bits) is described. In this method, Ahigh and Alow respectively represent the most and least significant halves of A. According to this method, Alow is logically shifted right by one bit to generate Alow  greater than  greater than 1. Then, Alow  greater than  greater than 1 is multiplied by B using signed multiplication to generate a first partial result. In addition, a second partial result is generated by performing signed multiplication of Ahigh times B. One or both of the first and second partial results is shifted to align the first and second partial results for addition, and then the addition is performed to generate a final result representing A multiplied by B.
According to another aspect of the invention, a method and apparatus for performing N bit by 2*N (or 2*Nxe2x88x921) bit signed multiplication using a particular packed data instruction is described. According to this method and apparatus, signed multiplication of at least a value A0 by a value B0 is performed. The value A0 and B0 are respectively represented in N and 2*N bits. In response to executing the particular instruction, the following is performed: 1) a first and second set of two data elements are read as part of a first set of two packed operands, wherein one data element in the first set of two data elements is zero, wherein one data element in the second set of two data elements represents B0, and wherein another data element in the second set of two data elements represents one of a most and least significant part of A0; 2) data elements in each of the first and second sets of two data elements are multiplied together to generate a pair of results; and 3) the pair of results are summed to generate a packed operand having a data element representing B0 multiplied by the one of the most and the least significant part of A0 found in the second set of two data elements. The packed operand is generated for use in generating B0 multiplied by A0.