A programmable logic device (PLD) is an integrated circuit device designed to be user-programmable so that users may implement logic designs of their choices. One type of PLD is the Complex Programmable Logic Device (CPLD). A CPLD includes two or more “function blocks” connected together and to input/output (I/O) resources by an interconnect switch matrix. Each function block of the CPLD includes a two-level AND/OR structure similar to that used in a Programmable Logic Array (PLA) or a Programmable Array Logic (PAL) device. Another type of PLD is a field programmable gate array (FPGA). In a typical FPGA, an array of configurable logic blocks (CLBs) is coupled to programmable input/output blocks (IOBs). The CLBs and IOBs are interconnected by a hierarchy of programmable routing resources. These CLBs, IOBs, and programmable routing resources are customized by loading a configuration bitstream, typically from off-chip memory, into configuration memory cells of the FPGA. For both of these types of programmable logic devices, the functionality of the device is controlled by configuration data bits of a configuration bitstream provided to the device for that purpose. The configuration data bits may be stored in volatile memory (e.g., static memory cells, as in FPGAs and some CPLDs), in non-volatile memory (e.g., flash memory, as in some CPLDs), or in any other type of memory cell.
PLDs also have different “modes” depending on the operations being performed on them. A specific protocol allows a programmable logic device to enter into the appropriate mode. Typical PLDs have internal blocks of configuration memory which specify how each of the programmable cells will emulate the user's logic. During a “program” mode, a configuration bitstream is provided to non-volatile memory, commonly called flash memory. An example of a non-volatile memory is a read-only memory (ROM) (e.g. a programmable ROM (PROM), an erasable PROM (EPROM), or an electrically erasable PROM (EEPROM)) either external or internal to the programmable logic device. Each address is typically accessed by specifying its row and column addresses. During system power up of a “startup” mode, the configuration bits are successively loaded from the non-volatile memory into static random access memory (SRAM) configuration latches of the configuration logic blocks. At the end of this start-up phase, the PLD is now specialized to the user's design, and the PLD enters into a “user” mode as part of its normal operation.
Whenever an architecture of a PLD changes, it is necessary that the new design addresses backward compatibility with previous designs. That is, it is important that the new PLD architecture be able to able to implement circuits designed for previous architectures of a PLD to enable the use of those circuit designs on the new architecture. Compatibility reduces the development required for new designs, since older netlists will still map to the new architecture. While digital signal processing (DSP) designers typically use word operations such as add, subtract and multiply, conventional PLDs typically operate at the bit level. However, the performance of bit oriented adders in PLDs is generally inefficient. Further, the performance of wide adders, such as 16-48-bit wide adders, is minimized in conventional devices. Without supporting high level abstractions directly, devices having different internal architectures may not be able to map the same operations transparent to the user. Further, while DSP operations tend to be smoothly scalable in word width, word oriented architectures of conventional DSPs tend to be inefficient when implementing word sizes which are not multiples of the unit word size.
Further, while conventional PLDs are inefficient when implementing arithmetic operations typical of DSP applications, the cost of interconnects associated with conventional PLDs implementing arithmetic operations is high. A bit-oriented interconnect pattern of conventional PLDs implementing arithmetic operations increases the configuration memory requirements, as well as the total depth of necessary interconnect multiplexing. Further, dissimilar blocks in the PLD fabric implementing multipliers or dedicated DSP blocks are generally inefficient and difficult to optimize. That is, these types of heterogeneous blocks require significant additional software to determine optimal mapping and partitioning strategies. More importantly, optimized hardware resources in conventional devices having programmable logic are not matched to the statistical usage found in typical DSP applications, an therefore are inherently inefficient. For example, while multipliers are common in DSP applications, adders are more common. Similarly, while 16-bit words are common, 64-bit words a much less common. However, conventional devices do not support arbitrary word sizes, and are not optimized to support specific operations and word sizes. Further, conventional PLDs implementing DSPs will often include circuits which go unused. That is, conventional PLDs do not allow the arithmetic fabric to be borrowed by an adjacent arithmetic unit and used for overflow bits or to extend the precision of the arithmetic units. Accordingly, the density of logical operators is low. Conventional devices also have inherent problems with latency. For example, conventional PLDs implementing DSP functions run at the minimum of the maximum frequencies of each operation, and the frequency is variable depending on signal routing. Finally, conventional PLDs implementing DSP designs encounter the issue of pipeline balancing, requiring the insertion of additional registers which reduces density.
Accordingly, there is a need for an improved circuit and method of implementing arithmetic functions in a programmable logic device enabling increasing the density and frequency of a DSP and reducing cost and power of DSP designs in PLDs.