Multiplication involves the production of partial products which may be produced either directly or indirectly, the reduction of the partial products to two values, and the subsequent addition of these two values by a two-to-one addition to produce the product. Assuming a carry-look-ahead two-to-one addition scheme, the speed of the multiplication is primarily influenced by the delay associated with the production of partial products, the number of partial products that must be reduced, and the delay associated with the reduction of these partial products to two values. For a given number of partial products, the delay associated with their reduction to two values is dependent upon the reduction technique and the counter chosen to implement the reduction.
The most common technique for indirect partial product production and assembly is the Booth algorithm since it is simple to implement, results in small delays for the production and assembly of multiples, and reduces the number of partial products. This is especially true when the three-bit algorithm is used in the design. Direct partial product generation and assembly also have been used. In the direct scheme, partial products are formed via a two-way AND between all combinations of the bits of the multiplicand and the multiplier with the weight of the partial product being determined by the weight of the bits being ANDed.
Two common techniques exist to reduce the partial products. The first scheme comprising either Wallace trees. or Dadda trees reduce the partial products in a parallel fashion by employing counters. Common counters used include three-to-two, five-to-three, and seven-to-three counters. Other counters may also be employed. The second technique involves the use of only three-to-two counters configured in a regular array to reduce the partial products. Both techniques have been employed extensively in the past. Their employment has been strictly based on the methodology used in the design rather than the advantages of one scheme versus the other. The parallel scheme has typically been employed in designs using a random logic methodology since less stages of logic are generally required to perform the reduction with this technique. This scheme, however, results in highly irregular structures that become difficult to place and wire. The second scheme requires more stages but is highly regular so that placement and wiring is relatively straight forward. As a result, this scheme is normally employed in designs using custom or semi-custom design methodology. To avoid prohibitive wire crossing, the counters used in array configurations have been limited to three-to-two counters as shown in FIG. 1.
In the past, multipliers generally have either employed the Booth algorithm for partial product generation in conjunction with either a Wallace or Dadda tree for partial product reduction or direct generation of partial products with array reduction of the resulting partial products using three-to-two CSAs. In this paper, a scheme for array reduction of partial products is presented that is not only suitable for use with the direct method, but is also useable in a hybrid approach where the partial product generation and assembly is performed using a Booth Algorithm while their reduction is accomplished by the array reduction scheme. A hybrid scheme is shown in FIG. 2 as applied to a S390 56.times.56 bit multiplier. (We will show how this hybrid apparatus can become an improved computing apparatus) As shown in this figure, a Booth encoder, implementing three-bit overlapped scanning, is used to produce 29 partial products that must be reduced to two values before entering the two-to-one adder. It is convenient to halve the resulting matrix of partial products before starting the reduction. This halving produces two matrices of 14 and 15 products that must be reduced. Though the following discussion uses this multiplier as an environment for presenting the concepts of the new reduction scheme, application of the concepts is not restricted to this environment.
As indicated above, m/n counters have been previously proposed and used for parallel reduction of the partial product matrix. A specific example uses the four-to-two counter shown in FIG. 3 to reduce, in a parallel fashion, the matrix of partial products of a 32.times.32 bit multiplier. While such counters may result in an advantage when compared with partial product reduction using three-to-two carry save adders, CSA, in a Wallace or Dadda reduction, they are unsuitable for regular array implementations. For example, consider the employment of the four-to-two counter shown in FIG. 3 to reduce a matrix of partial products using an array configuration. Because the number of relatively slow outputs, two, from this counter, C and S, exceeds the number of inputs, one, into the counter that are associated with relatively less critical paths, C.sub.in, at least one slow output from a preceding four-to-two counter will of necessity be wired to an input of the subsequent counter that must traverse its critical path. This can be seen in FIG. 4, where one of the slowest outputs, C, has been wired to the input whose path is the fastest, C.sub.in, leaving the slow output, S, to traverse the critical delay path. For this reason, the critical path through the counters performing the reduction is additive implying that the employment of such a counter in the array multiplier may not produce a speed advantage when compared with an array multiplier comprised of three-to-two CSAs. For example, the interconnections just described and shown in FIG. 4 indicate that each level of four-to-two counter reduces two products of the form a.sub.i b.sub.i, for a direct multiplication scheme, with the delay of a six-way XOR function. Given that the three-to-two CSA reduces a product term of the form a.sub.i b.sub.i with a delay of a three-way XOR via the interconnection scheme employing these CSAs shown in FIG. 1, no apparent speed advantage results from the employment of the four-to-two counter array multiplier scheme since either approach reduces two partial product terms by employing a six-way XOR function.
In the sections to follow, we present a new counter design referred to as a four-to-two composite counter whereby four-to-two is intended to designate the reduction of four new items, i.e. partial products, to outputs that span no more than two multiplier columns. For example, the counter reduces four partial products of weight i while only requiring communication with columns of weight i and i+1. As a result of the limited communication required, interconnections between the composite counters can be kept small making the composite counters suitable for array multiplier schemes. Consequently, we incorporate the design into an array multiplier scheme, determine the associated critical path, and compare the proposed scheme with other known array implementation schemes.