This invention relates to programmable logic devices (PLDs), and, more particularly, to clocking arrangements for specialized processing blocks which may be included in such devices.
As applications for which PLDs are used increase in complexity, it has become more common to design PLDs to include specialized processing blocks in addition to blocks of generic programmable logic resources. Such specialized processing blocks may include a concentration of circuitry on a PLD that has been partly or fully hardwired to perform one or more specific tasks, such as a logical or a mathematical operation. A specialized processing block may also contain one or more specialized structures, such as an array of configurable memory elements. Examples of structures that are commonly implemented in such specialized processing blocks include: multipliers, arithmetic logic units (ALUs), barrel-shifters, various memory elements (such as FIFO/LIFO/SIPO/RAM/ROM/CAM blocks and register files), AND/NAND/OR/NOR arrays, etc., or combinations thereof.
One particularly useful type of specialized processing block that has been provided on PLDs is a digital signal processing (DSP) block, which may be used to process, e.g., audio signals. Such blocks are frequently also referred to as multiply-accumulate (“MAC”) blocks, because they include structures to perform multiplication operations, and sums and/or accumulations of multiplication operations.
For example, a PLD sold by Altera Corporation, of San Jose, Calif., under the name STRATIX® II includes DSP blocks, each of which includes four 18-by-18 multipliers. Each of those DSP blocks also includes adders and registers, as well as programmable connectors (e.g., multiplexers) that allow the various components to be configured in different ways. In each such block, the multipliers can be configured not only as four individual 18-by-18 multipliers, but also as four smaller multipliers, or as one larger (36-by-36) multiplier. In addition, one 18-by-18 complex multiplication (which decomposes into two 18-by-18 multiplication operations for each of the real and imaginary parts) can be performed. In order to support four 18-by-18 multiplication operations, the block has 4×(18+18)=144 inputs. Similarly, the output of an 18-by-18 multiplication is 36 bits wide, so to support the output of four such multiplication operations, the block also has 36×4=144 outputs.
Because a specialized processing block such as a DSP block may be used for a single operation or for multiple operations, it may be desirable to be able to clock different portions of the specialized processing block separately. In the foregoing example of a DSP block that can be configured as four smaller multipliers, each portion, or quadrant, of the block, representing one multiplier in that example, might be clocked separately. Moreover, within each quadrant, there may be multiple pipelined stages, which might be clocked separately.
In a known arrangement, a plurality of clocks is selected from a universe of available clocks and made available to the DSP block. In one known embodiment, the plurality of clocks is equal in number to the number of portions—e.g., four—in the DSP block, one clock being derived from the universe of clocks by clock selection circuitry in each portion, but all clocks so derived being shared among all portions. Thus, in that known embodiment, the universe of clocks may include six clocks (which typically are selected, or “muxed down,” from an even larger number of clocks on the PLD, and provided, e.g., to a row of DSP blocks). Each quadrant of the DSP block selects one clock, so that four clocks are selected within that DSP block, and those four clocks are shared among all four quadrants of that DSP block (a different four of the six clocks may be selected by another DSP block sharing the same universe of clocks). In this known arrangement, within each quadrant, all four clocks are made available to each stage within the quadrant as well. Moreover, in the input multiplicand register stage, the registers for different groups of multiplicands associated with different multipliers can separately select from among all four clocks.
Such a clock arrangement is highly flexible, allowing each of the four clocks selected from the universe of six clocks to be selected separately by each stage of each quadrant of the DSP block (and separately by the two registers of the input stage). However, the clock distribution network necessary to support such flexible arrangement is area-intensive. It would be desirable to be able to provide a clocking arrangement for a specialized processing block in a PLD that is flexible but also efficient.