1. Field of the Invention:
The present invention relates to an arithmetic unit for use in the field of computation and more particularly in the computation of summed indexed products. The indexed products may be matrix products or vector products of which the individual terms may be either real or complex (i.e., a + jb) quantities. The applications also include polynomial generation and inner product calculation. The arithmetic unit herein disclosed is particularly adapted for use in computing multiple point Fast Fourier Transforms (FFT) using several known families of algorithms.
The arithmetic unit is optimized for fabrication using large scale integration techniques wherein minimum external interconnections, maximum integration of function, minimum cost per function, and maximum flexibility of application are important considerations.
2. Description of the Prior Art:
The invention is generally applicable to the computation of summed indexed products wherein the individual terms are complex quantities, as for instance in the computation of Fourier transformations. The discrete Fourier transform is defined as a summation of the products of successive uniformly distributed samples or "points" of a complex variable multiplied by a complex trigonometric function. The complex variable is normally time variant or spatially variant and the transform is the conversion from one domain to another, such as frequency or angle. The discrete Fourier transform is computed from a matrix of such products and is itself a column vector composed of successive Fourier coefficients computed at the sampling intervals or points. The Fast Fourier transform is defined as any of the forms of the Fourier matrix which are factorable due to symmetries, permitting recursions. These factorable forms typically employ permutations of the data combining operators, and multiplication involving diagonal matrix operators. A large number of algorithms are known for computing Fast Fourier Transforms and from these a selection may be made for maximum ease in hardward implementation.
All contemporary implementations of FFT transformers are likely to involve some degree of circuit integration but such use of integration has so far been far from optimum. The advent of integrated circuits (chips) has made it feasible to achieve ever larger amounts of computational capacity at ever lower incremental costs. Thus, whenever circuits performing individual logic functions and requiring a substantial number of active elements have been in large enough demand to pay the high initial costs of integration, they have been integrated. In this manner, integrated circuits for many common logic functions are now available and in any given equipment have displaced large numbers of discrete components. In a specific application requiring a substantial amount of computational capacity, as for instance, the computation of a 1024 point FFT transform with 10 bit precision in periods of a few milliseconds, all practical hardware embodiments have tended to involve a large amount of integrated circuitry. The integrated circuits "chips" have tended to small to medium scale integration, each performing a single, common logic function. Normally, such chips have been needed in quantities on the order of a hundred or two. When needed in such large quantities, medium scale integration (MSI) has tended to be only a short term solution because of the high cost of chip interconnections. Assuming a smaller number (typically one-tenth as many LSI chips) one would project that the total equipment costs would fall almost proportionately. In the case of FFT processors, the LSI solution has tended to stay uneconomic because of the difficulty of developing LSI chips, which when optimized for one specific FFT application, could also be applied to another FFT application. Thus, while large scale integration is potentially the most economic solution, an LSI chip capable of flexible FFT application has not hitherto been proposed.
With further reference to the FFT application, an arithmetic unit suitable for 1024 point transform with 10 bit precision, normally requires very substantial modification to compute a 256 point or a 4096 point transform. Similar modification is likely to be necessary when the precision or rate of computation is changed.
Some modes of paralleling have been suggested. One known approach is the "bit slice" philosophy of partitioning wherein several identical chips are used simultaneously during the computations upon a bit-parallel word format. However, this approach bears on parallelism upon the bits within a word, not parallelism upon words within the data array. Although nominally flexible in word length, carry propagation delay effects in fact limit the possible precision for a given clocking period.