This invention relates to electronic circuits and more specifically electronic circuits for use in a microprocessor for performing fast shifting of binary numbers. Fast shifting is essential to the performance of a microprocessor, being a significant part of multiply, divide and floating point operations (specifically in scaling and normalization).
FIG. 1 is a block diagram of a typical prior art multiplication machine 10, as is often found in microprocessors. The A source 11 initially receives 0, and thereafter stores the accumulation of partial products during the multiplication operation. Similarly B source 12 receives the multiplicand, and Q shifter 15 receives the multiplier. In the embodiment described, 16-bit words are multiplied, although it is to be understood that numbers of any length can be multiplied, for example, 8-bit, 32-bit, etc.
The operation of the prior art multiplication machine 10, of FIG. 1, is described with reference to FIG. 2. For the sake of simplicity, two 4-bit numbers are shown in FIG. 2; the binary number 1011 is the multiplicand stored in B source 12, and the binary number 1101 is the multiplier stored in Q shifter 15. In order to multiply two n-bit numbers, the accumulation of n-partial products is required. Thus, as shown in FIG. 2, A source 11 initially receives 0 as an initial partial product, B source 12 receives multiplicand 1011 and Q shifter 15 receives multiplier 1101. Under microcode control, ALU 13 adds the multiplicant stored in B source 12 to the partial product stored in A source 11. The result of the this operation is applied to shifter 14, the least significant bit of the result is applied to the most significant bit of Q shifter 15, and the remaining bits of shifter 14 are applied to A source 11, with the most significant bit of A source 11 receiving a sign extension (i.e., the most significant bit of A source 11 is set equal to the value of the bit to its right). Thus, A source 11 stores the new partial product 0101, and Q shifter 15 stores the multiplier, shifted to the right, and the least significant bit of the partial product.
This operation continues such that the entire multiplication operation is carried out. Prelogic 16 is used to mask (i.e., provide all zeros) the multiplicand when the least significant bit of the multiplier stored in Q shifter 15 during a partial product generation is a logical 0. The reason for this is, obviously, that if the bit of the multiplier by which the multiplicand is being multiplied is 0, 0 must be added to the partial product.
Following the completion of this summation of partial products, Q shifter 15 stores the n least significant bits of the result of the multiplication operation, and A source 11 stores the n most significant bits of the result of the multiplication operation.
As is well known to those of ordinary skill in the art, the operation depicted in FIGS. 1 and 2 can be slightly modified using the so-called modified Booth's algorithm in order to allow operation on 2-bits simultaneously. This reduces the number of partial product generation steps by a factor of 2, although the basic operations depicted in FIGS. 1 and 2 are the same. When using the modified Booth's algorithm, Q shifter 15 is capable of shifting two bits to the right simultaneously. Similarly, the structure of FIG. 1 is commonly used for division, as is well known to those of ordinary skill in the art. During division, Q shifter 15 shifts bits to the left, either one bit position if using the technique described above, or two bit positions if using other algorithms.
TABLE 1 ______________________________________ Scaling Operation Normalization Operation ______________________________________ Step 1 Shift MSB word to Step 1 Shift LSB word to right, make sign left, fill rightmost LSB extension, and store with 0, store leftmost rightmost MSB. LSB. Step 2 Shift LSB word to Step 2 Shift MSB word to right, and carryover left, and carryover stored rightmost MSB stored leftmost LSB to to leftmost LSB. rightmost MSB. ______________________________________
FIG. 3 shows a block diagram of a portion of a microprocessor configured to perform scaling (adjusting the mantissa of a number such that its exponent will equal the exponent of another number such that the two numbers can be added or subtracted) and normalization (adjusting the mantissa of a number such that its magnitude is between 0.1 and 1.0). Table 1 depicts the two step shifting operation of the structure of FIG. 3 during scaling and normalization of floating point numbers which have a 24 bit mantissa represented by two 16 bit words. A source 11 stores the floating point number for which the scaling or normalization operation is to be performed. For scaling, the exponents of two numbers are substrated and the difference is the amount by which the smaller number must be scaled. With this information available in an exponent difference register (not shown), the mantissa of the number of A source 11 is applied to shifter 14, and shifter 14 shifts the mantissa one bit to the right, and applies the result back to A source 11. Simultaneously, the exponent difference counter (not shown) is decremented to indicate the number of shifts still required in order to properly scale the number now stored in A source 11. This operation continues until the number stored in the exponent difference register is equal to zero. When this occurs, the number stored in A source 11 has been properly scaled. During the operation of shifter 14, when shifting to the right, multiplexer 31 provides the least significant bit of the mantissa being shifted to temporary flip-flop 32, which in turn applies this least significant bit to the most significant bit of shifter 14 during the next operation of shifter 14. Since in floating point operation the mantissa is contained as 16 bits of a first word and 8 bits of a second word, in order to shift a floating point mantissa, two operations of shifter 14 are required for a 16-bit machine, as depicted in Table 1. Similarly, for extended floating point operation, where the mantissa is stored in three words to provide a total of 40 bits, three operations of shifter 14 are required for a 16-bit machine to perform a single shift of the mantissa.
During normalization, a floating point number from A source 11 is normalized by shifting its mantissa to the left until the two most significant bits are different. During this operation, multiplexer 31 selects the most significant bit for storage in temporary flip-flop 32, which then applies this most significant bit to the least significant bit of the next operation of shifter 14. During this normalization operation, a counter (not shown) is incremented in order to reflect the number of left bit shifts performed on the mantissa. Exclusive OR gate 33 compares the two most significant bits of shifter 14 and provides a logical 1 NORMALIZATION FLAG output signal when they are different, indicating that normalization s complete. When normalization is complete, the number reflecting the number of left bit shifts which has taken place is subtracted from the exponent, thereby providing the normalized result.
TABLE 2 ______________________________________ Extended Floating Floating Point Operation Point Operation ______________________________________ Mantissa = 2 16-bit Mantissa = 3 16-bit words (24 bits) words (40 bits) 2 microcycles per bit shift. 3 microcycles per bit shift. 23 bit shifts, worst case. 39 bit shifts, worst case. 1 microcycle = 3 clock 1 microcycle = 3 clock cycles. cycles. 2 microcycles/bit shift 3 microcycles/bit shift times 23 bit shifts times 39 bit shifts times 3 clock cycles/micro- times 3 clock cycles/micro- cycles cycles Total 138 clock cycles Total 351 clock cycles ______________________________________
Table 2 shows the amount of time required for a typical prior art 16 bit microprocessor to perform floating point and extended floating point operations. In floating point operation, the mantissa is represented by 24 bits stored in two 16 bit words. Since the bits of two words must be shifted, two microcycles are required per bit shift. A maximum of 23 bit shifts may be required, for a worst case shift of a 24 bit mantissa. Assuming that a typical microcycle requires 3 clock cycles, this results in, worst case, a total of 138 clock cycles required to perform a 23 bit shift. This is a significant amount of time.
Extended floating point operation is even worse. In extended floating point operation, the mantissa is represented as 40 bits provided by 3 16-bit words. In the worst case, a maximum of 39 bit shifts may be required. This information, together with the fact that there are 3 microcycles per bit shift due to the use of 3 words to represent the mantissa, and assuming a typical microcycle requires 3 clock cycles, results in a total of 351 clock cycles required to perform a 39 bit shift. This is a substantial amount of time in the microprocessor, causing a degradation in the overall throughput of the microprocessor when extended floating point operations are required.
Another technique found in the prior art is the use of a barrel shifter, however this technique is mostly implemented in dedicated floating point coprocessors and is very costly in terms of integrated circuit chip size.