The present invention relates to a data processing technique for performing integer multiplication and, in particular, to a method and apparatus for supporting binary multiplication by an existing data processing apparatus, with only minor modifications thereto being required.
Central processor units ("CPUs") are normally provided with the capability to perform a multiplication instruction because multiplication is such a frequently required operation in data processing. A representative example of such multiplication instructions is referred to below as "MULi", and it can be performed by the 32000/EP family of CPU's available from National Semiconductor Corporation ("NSC").
The MULi instruction multiplies two n-bit signal (2's complement) or unsigned integer operands, where n=8, 16 or 32, and returns an n-bit result. The MULi instruction does not activate a trap if the result is too big to be represented using an n-bit number.
Existing CPUs which support the MULi instruction use either a dedicated array multiplier or a shift-and-add algorithm. CPU's which employ a dedicated array multiplier achieve high performance by manipulating both operands with a matrix of n.sup.2 (n is the number of bits) elements to execute the multiplication in a few cycles (usually one or two). A lower cost, and reduced performance, solution utilizes the shift-and-add algorithm. The execution time is proportional to n (usually 1*n+k or 2*n+k, where k is a small constant). The common MULi shift-and-add multiplication algorithm for binary numbers is described below with reference to FIGS. 1 and 2.
As shown in FIG. 1, storage register, or latch, 3 and shift register 5 are provided for storing operands A and B, respectively. Shift register 7 is provided for storing partial product P. AND gates 9 and arithmetic logic unit ("ALU") 11 complete the needed hardware. The single gate shown in FIG. 1 for AND gates 9 is a representation used for the sake of convenience. In actuality, an array of n AND gates is used, with n being the number of bits in operand A. Each of these gates receives on one of its two inputs a corresponding bit of operand A. On its other input the respective gates all receive the same control bit, namely the least significant bit ("LSB") of register 5. Thus, AND gates 9 together receive all the bits of operand A in latch 3 in parallel. AND gates 9 have n output lines which will carry the n-bits of latch 3 when the LSB of register 5 is a "1" and otherwise such output lines carry a "0" ALU 11 has n inputs for the above-described n outputs of AND gates 9, and it also receives the partial product P from register on a further set of n inputs.
The operation of the hardware shown in FIG. 1 proceeds in accordance with the flow chart of FIG. 2. In step 21, latch 7 is initialized to zero, and operands A and B are loaded into latch 3 and register 5, respectively. If step 23 determines that the LSB of register 5 is "1", the AND gates 9 open to put the contents A of latch 3 at the input of ALU 11 and, per step 25, ALU 11 adds P of register 7 to A of latch 3. The result is placed back into register 7 per step 27, and thus constitutes the new value of P. If, however, the LSB of register 5 is "0", AND gate 9 closes and a "0" is placed on its output to ALU 11. This value of "0" is added, per step 26, to P. Thus, steps 26 and 27 produce no change in the value of P in register 7.
After step 27 is performed, step 29 shifts the contents of registers 5 and 7 to the right, i.e. toward the LSB, by one bit. Also, since the LSB of register 7 is connected to the most significant bit ("MSB") of register 5, as part of this step a zero is shifted to the MSB of register 7, and the LSB of register 7 is shifted into the MSB of register 5. The LSB of register 5 is shifted out and dropped.
After the occurrence of n sequences of steps 23, 25 or 26, 27 and 29, as determined by step 31, the product of the multiplication is stored in registers 5 and 7. The content B of register 5 is taken as the value of the product. The data P in register 7 can be disregarded. The fact that only the data B of register 5 is relied upon by MULi is used to advantage in the invention described below.
The above-described implementation utilizes the following hardware: ALU 11, AND gates 9, shift registers 5 and 7, and latch 3. For most CPU configurations, the list of hardware elements requires the addition of two shift registers, while the ALU and latch are available as part of the existing hardware thereof.