The present invention relates to generating sequences of shift, add, and subtract instructions to perform multiplication by an integer value.
Many implementations of computer processors do not have a multiply instruction. For those processors that do have a multiply instruction, it is often very expensive because it can take many machine clock cycles to perform a multiply instruction. However, it is possible to perform multiplication by an integer value using just normal arithmetic and logical unit (ALU) instructions of the processor such as the common shift, add, and subtract processor instructions. An unknown value can be multiplied by a power of 2 by shifting the unknown value left by the exponent of the power of 2. The original unknown value or an intermediate result can then be added or subtracted to achieve multiplication by integers that are not powers of 2. Usually shift, add, and subtract instructions can be performed in one machine clock cycle per instruction, so sequences of these instructions to perform multiplication are preferable to a multiply instruction when a sufficiently short sequence can be found that will execute faster than the multiply instruction.
Sequences of ALU instructions to perform integer multiplication by an integer value are commonly generated using either an analytical algorithm at compile time, or by looking up a sequence in a previously generated table.
Two possible analytical approaches to generating integer multiplication sequences are discussed in D. Knuth, The Art of Computer Programming, Vol. 2: Serinumerical Algorithms, 2nd Ed., Addison-Wesley, Reading, Mass., 1981, pp. 441-462 namely the binary method and the power tree method. These two methods are further discussed in R. Bernstein, Multiplication by Integer Constants, Softwarexe2x80x94Practice and Experience, vol. 16(7), pp. 641-652, John Wiley and Sons, Ltd. (1986) and combined into a hybrid method. This hybrid method can generate a sequence of shift, add, and subtract instructions for performing multiplication by any integer value. Furthermore, Bernstein""s method generates very efficient sequences that are often the minimum number of instructions possible to perform the multiplication using just shift, add, and subtract instructions. However, a skilled assembly language programmer can sometimes find sequences of shift, add, and subtract instructions that are shorter than those generated using Bernstein""s method.
Another method for generating efficient sequences of instructions for performing multiplication by integer values is through the use of a lookup table. This involves generating a table that holds the optimal sequence of ALU instructions for multiplication by each integer. The advantage to using this method is that every possible combination of shift, add, and subtract instructions that combine to achieve multiplication by a particular integer can be tested until the most efficient sequence is produced. The amount of time taken to generate these sequences is not a factor because they are generated separately beforehand, rather than during a programs compilation.
The disadvantage to using the lookup table method is that sequences for every possible integer value must be stored in the lookup table, which can require a large amount of memory storage. For example, assume that each ALU instruction can be encoded such that the instruction opcode and its operands can be packed into 32 bits. For 32-bit integers, multiplication can usually be performed in 20 shift, add, and subtract instructions or fewer, so 20*32 bits should be reserved for each sequence. To encode all of this information in a table for all 32-bit integers, (2{circumflex over ( )}{circumflex over ( )}32)*(20*32 bits)=343.6 terabytes of storage would be required. Since this amount of information is far too large to be incorporated into a compiler, the table size must be reduced by restricting the length of sequences, the number of sequences included in the table, or the amount of information for each instruction.
It is difficult to represent a sequence of instructions in a compact manner without losing some flexibility in the generated sequence of instructions. The lookup table method will likely impose a maximum length on sequences of instructions so that the table size is minimized, which constrains the possible sequences generated. Also, to minimize the size of the lookup table, a subset of all integers will usually be chosen to be represented in the table. The lookup table representation disclosed in U.S. Pat. No. 5,764,990 manages to pack representations of sequences for each integer into 64 bits in the lookup table. However, this lookup table implementation faces both the constraint that only a maximum of 8 instructions can be used in a sequence and that only numbers between xe2x88x9265536 and 65535 are generated so that the lookup table is not too large. This approach can not always represent the most efficient sequence of shift, add, and subtract instructions for all integers, but is quite good for smaller numbers that have short generated sequences.
Also, it has been shown in T. Granlund, P. Montgomery, Division By Invariant Integers using Multiplication, Association of Computing Machinery, 0-89791-662-x/94/0006 (1994) that division by integer constants can be accomplished using integer multiplication followed by a shift right instruction. Thus, sequences of shift, add, and subtract instructions can be used to accomplish integer division as well as multiplication.
Thus, it is desirable to provide a method, system and computer program product for generating an efficient sequence of ALU instructions for performing integer multiply operations that overcomes the foregoing and other disadvantages.
The present invention is an improvement on the analytical algorithm for generating ALU instruction sequences for performing integer multiplication described by Bernstein. The present invention analytically finds an optimal sequence of shift, add and subtract instructions for performing multiplication by any integer value, improving on the results of the Bernstein algorithm which in some cases produces longer instruction sequences than required for a particular integer multiplication.
The present invention has an advantage of generating instruction sequences having at most as many instructions as would be generated by the Bernstein algorithm but optimally generating sequences having fewer instructions than as would be generated using the Bernstein algorithm, thus facilitating the increased speed at which a compiled program could run and reducing the size of the program""s code. By relying on a dependent chain of instructions, what Knuth calls a xe2x80x9cstar chainxe2x80x9d, the present invention helps reduce the number of temporary registers required during multiply (and thus increase execution speed) because each instruction depends on the result of the preceding instruction. Further, the present invention helps the reduce the actual number of ALU instructions needed in a program to perform the multiply and hence helps reduce program size.
The present invention also has an advantage of not being significantly more expensive computationally. It looks for the same instruction sequences as Bernstein""s hybrid binary and power tree algorithm using the same method, and only searches for the additional performance opportunities when a sufficiently fast instruction sequence is not found with Bernstein""s method.
Further, the present invention has an advantage over the table lookup method of generating instruction sequences for performing multiplication by an integer value by not having to rely on a lookup table. To generate the same optimal sequences of ALU instructions of the present invention, a relatively large amount of information would need to be stored in a lookup table for each instruction sequence, resulting in a massive storage requirement. Moreover, the present invention has no upper limit to the highest integer value it can handle, a limitation which a lookup table inherently possesses.
Accordingly, there is provided a computer-implemented method for multiplication by an integer comprising the steps of (a) recursively finding the factors of the integer using a power of 2, a power of 2 plus 1 and a power of 2 minus 1; (b) recursively finding the factors of the integer (or the factors found in (a)) plus and minus 1; using a power of 2, a power of 2 plus 1; and a power of 2 minus 1; (c) adding to and subtracting from the integer all powers of 2 less than the integer; (d) finding the factors of each resulting sum and difference of (c) using the power of 2 used in calculating the sum and difference of (d) (c) plus and minus 1; (e) constructing one or more instruction sequences based upon the factors found in (a) and (b), and upon the factors found in (d) if a resulting sum or difference factored evenly in (d) and the resulting factor is a power of 2, (f) finding the lowest cost instruction sequence from the one or more instruction sequences; and (g) executing the lowest cost instruction sequence to effect the multiplication by the integer. The above method is also provided wherein the step of constructing one or more instruction sequences comprises generating one or more shift, add or subtract instructions based upon the found factors. The above method is further provided wherein steps (a)-(f) are performed in a compiler. And the above method is provided wherein the step of finding the lowest cost instruction sequence comprises calculating a cost for each instruction sequence based upon the processing time of each instruction sequence.
There is also provided a method for generating instruction sequences for multiplication by an integer comprising the steps of (a) recursively finding the factors of the integer using a power of 2, a power of 2 plus 1 and a power of 2 minus 1; (b) recursively finding the factors of the integer (or the factors found in (a)) plus and minus 1 using a power of 2, a power of 2 plus 1 and a power of 2 minus 1; (c) adding to and subtracting from the integer all powers of 2 less than the integer; (d) finding the factors of each resulting sum and difference of (c) using the power of 2 used in calculating the sum and difference of (c) plus and minus 1; (e) constructing one or more instruction sequences based upon the factors found in (a), and (b), and upon the factors found in (d) if a resulting sum or difference factored evenly in (d) and the resulting factor is a power of 2, and (f) finding the lowest cost instruction sequence from the one or more instruction sequences. The above method is also provided wherein the step of constructing one or more instruction sequences comprises generating one or more shift, add or subtract instructions based upon the found factors. The above method is also provided wherein steps (a)-(f) are performed in a compiler. And the above method is provided wherein the step of finding the lowest cost instruction sequence comprises calculating a cost for each instruction sequence based upon the processing time of each instruction sequence.
Further, there is provided a computer-implemented method for multiplication by an integer comprising the steps of (a) adding to and subtracting from the integer all powers of 2 less than the integer; (b) finding the factors of each resulting sum and difference of (a) using the power of 2 used in calculating the sum and difference of (a) plus and minus 1; (c) if a resulting sum or difference factored evenly in (b) and the resulting factor is a power of 2, constructing one or more instruction sequences based upon the factors found (b); and (d) executing the instruction sequence to effect the multiplication by the integer. The above method also is provided wherein the step of constructing one or more instruction sequences comprises generating one or more shift, add or subtract instructions based upon the found factors. The above method is also provided wherein steps (a)-(c) are performed in a compiler. And, the above method may further comprise the steps of determining one or more instruction sequences based upon the Bernstein algorithm and determining the lowest cost instruction sequence from among the one or more sequences determined using the Bernstein algorithm and the one or more instruction sequences constructed in step (c).
Also provided is a method for generating instruction sequences for multiplication by an integer comprising the steps of (a) adding to and subtracting from the integer all powers of 2 less than the integer; (b) finding the factors of each resulting sum and difference of (a) using the power of 2 used in calculating the sum and difference of (a) plus and minus 1, and (c) if a resulting sum or difference factored evenly in (b) and the resulting factor is a power of 2, constructing one or more instruction sequences based upon the factors found (b). The step of constructing one or more instruction sequences may also comprise generating one or more shift, add or subtract instructions based upon the found factors. The above method may also be provided wherein the steps are performed in a compiler. And, the above method may further comprise the steps of determining one or more instruction sequences based upon the Bernstein algorithm and determining the lowest cost instruction sequence from among the one or more sequences determined using the Bernstein algorithm and the one or more instruction sequences constructed in step (c).
There is further provided a computer-implemented method for multiplication by an integer comprising the steps of, for each power of 2 less than the integer, (i) adding the power of 2 to the integer, (ii) finding the factors of the sum in (i) using the power of 2 in (i) plus or minus 1, (iii) subtracting the power of 2 from the integer, (iv) finding the factors of the difference in (iii) using the power of 2 in (iii) plus or minus 1, (v) ignoring those factors determined in (ii) and (iv) where the respective sum or difference does not factor evenly by the power of 2 plus or minus 1, and (vi) where a sum or difference factors evenly in (ii) or (iv) and the resulting factor for that sum or difference is a power of 2, generating an instruction sequence based upon those factors; and executing the instruction sequence. The above method may be further provided wherein the step of generating an instruction sequence comprises generating one or more shift, add or subtract instructions based upon the found factors. The above method may also be provided wherein all steps but the step of executing are performed in a compiler. And, the above method may further comprise the steps of determining one or more instruction sequences based upon the Bernstein algorithm and determining the lowest cost instruction sequence from among the one or more sequences determined using the Bernstein algorithm and the one or more instruction sequences determined in step (vi).
Also, there is provided a method for generating instruction sequences for multiplication by an integer comprising the steps of, for each power of 2 less than the integer, (i) adding the power of 2 to the integer, (ii) finding the factors of the sum in (i) using the power of 2 in (i) plus or minus 1, (iii) subtracting the power of 2 from the integer, (iv) finding the factors of the difference in (iii) using the power of 2 in (iii) plus or minus 1, (v) ignoring those factors determined in (ii) and (iv) where the respective sum or difference does not factor evenly by the power of 2 plus or minus 1, and (vi) where a sum or difference factors evenly in (ii) or (iv) and the resulting factor for that sum or difference is a power of 2, generating an instruction sequence based upon those factors. The above method may also be provided wherein the step of generating an instruction sequence comprises generating one or more shift, add or subtract instructions based upon the found factors. The steps above may also be performed in a compiler. And, the above method may further comprise the steps of determining one or more instruction sequences based upon the Bernstein algorithm and determining the lowest cost instruction sequence from among the one or more sequences determined using the Bernstein algorithm and the one or more instruction sequences determined in step (vi).
There is further provided a program storage device, tangibly embodying computer readable program code, for causing a computer to perform the method steps of any one of the above methods.
Additionally, there is provided a computer program product for multiplication by an integer, the computer program product comprising (a) computer readable code means for recursively finding the factors of the integer using a power of 2, a power of 2 plus 1; and a power of 2 minus 1; (b) computer readable code means for recursively finding the factors of the integer (or the factors found in (a)) plus and minus 1; using a power of 2, a power of 2 plus 1 and a power of 2 minus 1; (c) computer readable code means for adding to and subtracting from the integer all powers of 2 less than the integer; (d) computer readable code means for finding the factors of each resulting sum and difference of (c) using the power of 2 used in calculating the sum and difference of (d) (c) plus and minus 1; (e) computer readable code means for constructing one or more instruction sequences based upon the factors found in (a) and (b), and upon the factors found in (d) if a resulting sum or difference factored evenly in (d) and the resulting factor is a power of 2, (f) computer readable code means for finding the lowest cost instruction sequence from the one or more instruction sequences; and (g) computer readable code means for executing the instruction sequence to effect the multiplication by the integer.
There is also provided a computer program product for generating instruction sequences for multiplication by an integer, the computer program product comprising (a) computer readable code means for recursively finding the factors of the integer using a power of 2, a power of 2 plus 1; and a power of 2 minus 1; (b) computer readable code means for recursively finding the factors of the integer (or the factors found in (a)) plus and minus 1; using a power of 2, a power of 2 plus 1; and a power of 2 minus 1; (c) computer readable code means for adding to and subtracting from the integer all powers of 2 less than the integer; (d) computer readable code means for finding the factors of each resulting sum and difference of (c) using the power of 2 used in calculating the sum and difference of (c) plus and minus 1; (e) computer readable code means for constructing one or more instruction sequences based upon the factors found in (a) and (b)r and upon the factors found in (d) if a resulting sum or difference factored evenly in (d) and the resulting factor is a power of 2, and (f) computer readable code means for finding the lowest cost instruction sequence from the one or more instruction sequences.
Furthemore, there is provided a computer program product for multiplication by an integer, the computer program product comprising (a) computer readable code means for adding to and subtracting from the integer all powers of 2 less than the integer; (b) computer readable code means for finding the factors of each resulting sum and difference of (a) using the power of 2 used in calculating the sum and difference of (a) plus and minus 1; (c) computer readable code means for, if a resulting sum or difference factored evenly in (b) and the resulting factor is a power of 2, constructing one or more instruction sequences based upon the factors found (b); and (d) computer readable code means for executing the instruction sequence to effect the multiplication by the integer.
Also, there is provided a computer program product for generating instruction sequences for multiplication by an integer, the computer program product comprising (a) computer readable code means for adding to and subtracting from the integer all powers of 2 less than the integer; (b) computer readable code means for finding the factors of each resulting sum and difference of (a) using the power of 2 used in calculating the sum and difference of (a) plus and minus 1; and (c) computer readable code means for, if a resulting sum or difference factored evenly in (b) and the resulting factor is a power of 2, constructing one or more instruction sequences based upon the factors found (b).
There is also provided a computer program product for multiplication by an integer, the computer program product comprising, for each power of 2 less than the integer, (i) computer readable code means for adding the power of 2 to the integer, (ii) computer readable code means for finding the factors of the sum in (i) using the power of 2 in (i) plus or minus 1, (iii) computer readable code means for subtracting the power of 2 from the integer, (iv) computer readable code means for finding the factors of the difference in (iii) using the power of 2 in (iii) plus or minus 1, (v) computer readable code means for ignoring those factors determined in (ii) and (iv) where the respective sum or difference does not factor evenly by the power of 2 plus or minus 1, and (vi) computer readable code means for, where a sum or difference factors evenly in (ii) or (iv) and the resulting factor for that sum or difference is a power of 2, generating an instruction sequence based upon those factors; and computer readable code means for executing the instruction sequence.
And, there is provided a computer program product for generating instruction sequences for multiplication by an integer, the computer program product comprising, for each power of 2 less than the integer, (i) computer readable code means for adding the power of 2 to the integer, (ii) computer readable code means for finding the factors of the sum in (i) using the power of 2 in (i) plus or minus 1, (iii) computer readable code means for subtracting the power of 2 from the integer, (iv) computer readable code means for finding the factors of the difference in (iii) using the power of 2 in (iii) plus or minus 1, (v) computer readable code means for ignoring those factors determined in (ii) and (iv) where the respective sum or difference does not factor evenly by the power of 2 plus or minus 1, and (vi) computer readable code means for, where a sum or difference factors evenly in (ii) or (iv) and the resulting factor for that sum or difference is a power of 2, generating an instruction sequence based upon those factors.