A squarer can be implemented using a general purpose multiplier having both of the inputs tied together. However, such an implementation results in a squarer that requires a large silicon area to implement, and is slow in operation. One known technique to increase the speed of a squarer is to fold the partial product array over a diagonal of an original array of partial products. The folded array of partial products is shifted since a multiplication by two is a left shift by one bit position. The folded array of partial products is then reduced using any one of a number of standard techniques. For example, the folded array of partial products can be reduced using an array of carry-save adders, followed by a carry-propagate adder. The maximum number of partial products to be reduced, such as in a column, is .left brkt-bot.n/2.right brkt-bot.+1 when squaring an n-bit representation of a number.
A technique to further reduce the number of partial products is described in "A Fast Parallel Squarer Based On Divide-and-Conquer" published by T. Yoo, K. F. Smith, and G. Gopalakrishnan in IEEE Journal of Solid State Circuits, vol. 32, June, 1997, pages 909-912. The divide-and-conquer technique employs progressively more complex circuits to reduce the total number of partial products. However, the number of partial products in the largest column remains .left brkt-bot.n/2.right brkt-bot.+1 for an n-bit squaring operation. While the divide-and-conquer technique is an improved squaring method, increased speed can be achieved by decreasing the number of partial products in the control path of the squarer. Therefore, there is a need for a squarer that decreases the number of partial products in the critical path of the squarer.