Complex programmable logic devices (CPLDs) and field programmable gate arrays (FPGAs) are frequently implemented to compute arithmetic and other functions. Multiplication functions are complex, particularly for 32-bit inputs, and consume considerable device resources on both FPGA and CPLD devices. Additionally, multiplication functions are slow in CPLDs and/or FPGAs. However, multiplication functions are frequently used in digital signal processing (DSP) and communication areas, including signal processing, filtering, multimedia, etc.
An example of an approach for improving the performance of CPLDs and FPGAs using adders and comparators can be found in U.S. Pat. No. 5,448,185, which is hereby incorporated by reference in its entirety.
Conventional multiplication methods use considerable device resources, resulting in slow multiply functions. A multiply function can be implemented as discrete multiplication blocks (i.e., to create partial products) that are fed to an adder (i.e., to add all the partial products, with appropriate shifting). Conventional multiplication methods may force a user to implement a larger device than desirable, since additional functionality needs to be added around the multiplier. The size of the multiplier frequently limits resources available for other functionality of the device. For example, an 8×8 combinatorial signed multiplier requires approximately 66% of a cluster (2318 square mils of silicon) when implemented as programmable logic.
Additionally, CPLDs and FPGAs are frequently used to implement cyclic redundancy check (CRC) functions. CRC functions use a chain of XORs with a feedback that may be implemented as discrete blocks, in a similar manner to the way a multiplier can be implemented as discrete blocks. CRC functions are based on XOR chains and consume considerable resources on CPLDs, especially for 32-bit inputs. A 32-bit CRC function uses two logic blocks on a typical CPLD. CRC functions are used in the communications area for error checking and forward error correction. Several flavors of CRC are used in telecommunication devices.
One such conventional approach for CRC involves creating a XOR function from programmable logic resources on the device, usually from AND and OR gates, and sometimes also using a memory as a look-up-table (LUT). Such an implementation results in a large and/or slow function that requires a large area for implementation. Another such conventional approach for CRC involves creating a CRC function from the programmable logic resources on the device, usually from AND and OR gates, and sometimes using carry chain XORs. Such an implementation results in a slow function, which limits overall system performance. For example, a conventional 32-bit combinatorial XOR function runs at 61 MHZ max on a particular programmable technology and architecture. Conventional CRC approaches have (i) slow combinatorial functions or (ii) fast pipeline functions requiring multiple cycles of latency both of which limit overall system performance. For example, a 32-bit CRC requires 33 macrocells (i.e., 2 logic blocks), when pipelined can run at 140 MHz maximum and when non-pipelined can run at 70 MHz maximum for a particular programmable technology.