1. Field of the Invention
The invention relates generally to high-speed multiplication apparatus and, in particular, to high-speed, parallel multiplication apparatus for use in a data processing system.
2. Brief Description of the Prior Art
In data processing systems, the multiplication of two numbers as a part of the execution of a program is a common occurrence. The speed, then, with which the data processing system performs this multiplication is critical to the performance of the machine.
Various schemes have been developed to increase the speed with which such multiplication functions are performed. However, as these schemes become more elaborate, the cost of implementing them increase. In data processing systems, and particularly, in smaller, economical systems, the extra speed or performance gained by the use of elaborate techniques must be balanced by the cost of incorporating them in the system.
Typically in a data processing system, multiplication of two floating point numbers are required. The exponent portions of both numbers are added together and the sum is concatenated with the product of the fractional portions. The fractional portions of these numbers comprise numerous binary bits. For example, in what is commonly referred to as single-precision multiplication, the fractional portion of the two numbers to be multiplied are comprised of 24 binary bits. In double-precision multiplication, the length is substantially longer, and typically, comprises 56 binary bits in the fractional portion of both floating point numbers to be multiplied.
Due to the size of the fractional portion of these numbers, various schemes have been developed for use in high-performance systems to multiply rapidly the fractional portions of the floating point numbers. Most commonly, the fractional portions of the floating point numbers are divided into segments, typically, four bits in length, which may be multiplied to produce a number of summands which eventually will be combined to form the final product.
Thus, for example, in prior art high-performance systems, each four-bit field in one of the fractions (the multiplicand) is multiplied by each four-bit field of the other fraction (the multiplier) by means of a plurality of multiplier circuits. This process produces numerous (4 bit.times.4 bit products) summands which all must be recombined to form the product of the multiplicand and the multiplier.
Each of these summands is comprised of a number of binary bits (8 bits for the example above) and each bit represents a particular power of the number 2. If, all the summands are arranged accordingly, then, it can be appreciated that for the purposes of recombination of these summands, there will be numerous such summands having a bit position therein representative of the same power of the number 2 for each and every power of the number 2 in the product. For example, where all six four-bit fields of the multiplier are each multiplied by all six of the four-bit fields of the multiplicand in a single-precision processor, as many as 12 summands may each contain a bit position representative of the same power of the number. These of course must all be added together to form the product of the multiplicand and the multiplier.
In the prior art, several schemes have evolved for rapidly recombining or reducing the summands to form the multiplier/multiplicand product. One such technique is described in an article by C. S. Wallace entitled "A Suggestion for a Fast Multiplier" in volume EC-13 of IEEE Transactions on Electronic Computers dated February of 1964 at pages 14-17. In this technique, the bit positions of each of the summands representing the same power of the number 2 are arranged into a matrix wherein the columns are representative of the various powers of the number 2 and the rows are determined by the number of entries in each column (i.e. bit positions in the summands representing the same power of the number 2). For example, in a single-precision processor, the maximum number of entries in a column (and, hence the maximum number of rows in the matrix) is 12.
The rows of this matrix are reduced in number in successive stages by means of a pseudo-adder tree network. Specifically, groups of three rows in the matrix are added by a string of full adders which reduce each group of three rows to two rows. Similarly, the rows generated by the first set of full-adder strings are thereafter again reduced in number by the same method. This process is repeated until two rows are left. These two rows are then added by a conventional carry-save adder circuit involving substantial carry-ripple or carry-propagation delays.
This technique provides a significant reduction in the time to combine the summands because the delays incurred in the reduction process above prior to the last addition of two rows are substantially less than those incurred by previous methods involving regular row-by-row addition using conventional carry-save adding techniques.
Another technique for reducing the summands to two numbers whose sum equals the multiplier/multiplicand product is described in an article by L. Dadda entitled "Some Schemes for Parallel Multipliers", in volume 34 of IEEE Transactions on Electronic Computers dated 1965 at pages 349-356. This technique utilizes counters to reduce the rows in the matrix by stages until two rows remain which can be added by conventional carry-save adders. Although this technique reduces the matrix in fewer steps than that of Wallace, several steps are still required to yield the final two numbers to be added to form the multiplicand/multiplier product.
These prior art techniques for high-performance data processing systems, however, require the utilization of a substantial number of components and consume a significant amount of space in the processor thereby adding substantially to the cost of the data processing system. In smaller data processing systems, such as minicomputers, this cost has in the past been prohibitive. Therefore, prior art minicomputer systems have for the most part utilized a technique wherein the entire multiplicand is multiplied individually by each bit of the multiplier. Specifically, the multiplicand is first multiplied by the first bit of the multiplier and thereafter is shifted in a shifter and added with the product of the second bit of the multiplier with the entire multiplicand. This result is then shifted and added with the product of the multiplicand and the third bit of the multiplier. Similarly, this process is repeated until each bit of the multiplier has been multiplied with the multiplicand and added to the previous result thereby yielding the final multiplier/multiplicand product.
In one prior art minicomputer system, a technique is used to examine the contents of the multiplier so that, when bits of the multiplier representing binary 0 occur, the shifter circuitry is immediately arranged to shift its contents thereby avoiding a multiplication step.
These prior art minicomputer multiplication techniques are, of course, substantially slower in terms of speed than the techniques used for the prior art high-performance data processing systems. However, these techniques are substantially less expensive to incorporate.