The subject invention relates to a method and apparatus for performing computations using residue arithmetic. The subject method and apparatus can utilize the Residue Number System (RNS) to implement automatic computing machinery. The use of the RNS has been proposed in Garner, H. L., “The Residue Number System,” IRE Transactions on Electronic Computers, vol. EL-8, No. 6, June 1959, pp. 140-147, and Taylor, F. J., “Residue Arithmetic: A Tutorial with Examples,” IEEE Computer, vol. 17, No. 5, May 1984, pp. 50-61. The RNS is generally used to implement automatic computing machinery for digital signal processing. Digital signal processing (DSP) is dominated by the repetitive computation of sums of products. The RNS is well-suited to performing computations of this type, as demonstrated in Mellott, J. D., Lewis, M. P., Taylor, F. J., “A 2D DFT VLSI Processor and Architecture,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Atlanta, 1996, and Mellott, J. D., Smith, J. C., Taylor, F. J., “The Gauss Machine—A Galois-Enhanced Quadratic Residue Number System Systolic Array,” Proceedings of IEEE 11th Symposium on Computer Arithmetic, Windsor Ontario, 1993, pp. 156-162.
In the past, it has often been impractical to implement large-scale digital signal processors using a single semiconductor device due to the limitations of the amount of logic that can be placed on such a device. Instead, large-scale digital signal processors were typically implemented using discrete logic. The RNS is well-suited to this implementation methodology since its need for small adders and table lookup functions corresponds with the common availability of discretely packaged small adders and small programmable read-only memories (PROMs). An example of this implementation methodology is the Gauss Machine, discussed in the aforementioned reference by Mellott, et al. As it became possible to integrate large-scale digital signal processors onto a single semiconductor device, the methodology of using small adders and memories was carried forward. An example of such a digital signal processor is given by Smith, J. C., Taylor, F. J., “The Design of a Fault Tolerant GEQRNS Processing Element for Linear Systolic Array DSP Applications,” Proceedings of IEEE Great Lakes Symposium on VLSI, Notre Dame, Ind., 1994. Other examples of RNS digital signal processors can be found in U.S. Pat. No. 5,117,383 (Fujita et al.), issued May 26, 1992; U.S. Pat. No. 5,008,668 (Takayama, et al.), issued Apr. 16, 1991, U.S. Pat. No. 4,949,294 (Wambergue), issued Aug. 14, 1990; and U.S. Pat. No. 4,281,391 (Huang), issued Jul. 28, 1981.
The aforementioned examples disclose the use of ROMs for implementation of table lookup functions. For the small table lookup functions typically found in RNS digital signal processor implementations, ROMs are attractive because they are easy to program and have a known speed, area, and power characteristics. In contrast, the manual design of a collection of logic gates to realize a table lookup function can be a daunting task, and the speed, area, and power characteristics are generally not fully known until the time that the circuit is designed. Another feature associated with prior use of ROMs in integrated, as opposed to discrete, RNS digital signal processor implementations is that the ROMs offer favorable die area compared to other possible means of implementing small table lookups.
Prior techniques for performing computations using RNS suffer from one or more disadvantages related to the use of memories, usually ROMs, to implement table lookup functions. Some of these disadvantages include: memories with the required properties for use in RNS computations are not available in sufficient quantity in all ASIC implementation technologies; memories often contain analog circuitry that uses significant power even if there is no switching activity in the circuit; the analog circuitry found in most memory devices does not scale well into deep sub-micron semiconductor fabrication technologies; memories, since they are dependent upon analog circuits (e.g., differential amplifiers), can be more difficult to test than digital logic circuits, can require separate tests and test mechanisms than digital logic circuits, and are not generally compatible with leakage current (IDDQ) test methodologies; there is little or no flexibility to optimize a memory with respect to one or more of speed, power, and area; memories can be difficult to pipeline, and in many implementation technologies there is no realistic option to pipeline memory; the size of the memory is typically fixed by the number of inputs and outputs, and is essentially independent of the contents of the memory; for reliability reasons, wires unrelated to a memory are not usually allowed to pass over a memory on a semiconductor device, such that the presence of many small memories on a semiconductor device, such as would be used in an apparatus to perform computations using the RNS, can impair the ability to connect various functions, both memory and non-memory, on the device.