For the design of digital circuits (e.g., on the scale of Very Large Scale Integration (VLSI) technology), designers often employ computer-aided techniques. Standard languages such as Hardware Description Languages (HDLs) have been developed to describe digital circuits to aid in the design and simulation of complex digital circuits. Several hardware description languages, such as VHDL and Verilog, have evolved as industry standards. VHDL and Verilog are general-purpose hardware description languages that allow definition of a hardware model at the gate level, the register transfer level (RTL) or the behavioral level using abstract data types. As device technology continues to advance, various product design tools have been developed to adapt HDLs for use with newer devices and design styles.
In designing an integrated circuit with an HDL code, the code is first written and then compiled by an HDL compiler. The HDL source code describes at some level the circuit elements, and the compiler produces an RTL netlist from this compilation. The RTL netlist is typically a technology independent that it is independent of the technology/architecture of a specific vendor's integrated circuit, such as field programmable gate arrays (FPGA) or an application-specific integrated circuit (ASIC). The RTL netlist corresponds to a schematic representation of circuit elements (as opposed to a behavioral representation). A mapping operation is then performed to convert from the technology independent RTL netlist to a technology specific netlist, which can be used to create circuits in the vendor's technology/architecture. It is well known that FPGA vendors utilize different technology/architecture to implement logic circuits within their integrated circuits. Thus, the technology independent RTL netlist is mapped to create a netlist, which is specific to a particular vendor's technology/architecture.
In designing a circuit, transformations are frequently performed to optimize certain design goals. For example, transformations may be performed to reduce the area used by a circuit. Folding transformation is one of the systematical approaches to reduce the silicon area used by an integrated circuit. By executing multiple algorithm operations on a single function unit, the number of functional units in the implementation can be reduced. More details about folding transformations can be found in “VLSI digital signal processing systems: design and implementation”, by Keshab K. Parhi, Wiley-Interscience, 1999.
Time multiplexed resource sharing has been used in the digital circuitry. For example, Peripheral and Control Processors (PACPs) of the CDC 6600 computer, described by J. E. Thornton in “Parallel Operations in the Control Data 6600”, AFIPS Proceedings FJCC, Part 2, Vol. 26, 1964, pp. 33-40, share execution hardware by gaining access to common resources in a round-robin fashion. Another example about resource sharing for multi-channel filters can be found in: Jhon J. Leon Franco, Miguel A. Melgarejo, “FPGA Implementation of a Serial Organized DA Multichannel FIR Filter”, Tenth ACM International Symposium on Field Programmable Gate Arrays, Monterey, Calif., Feb. 24-26, 2002.
A conventional folding algorithm can be used to automatically generate a design with time-multiplexed resource sharing from a given design. A conventional folding algorithm identifies the multiple algorithm operations that can be time multiplexed to a single functional unit to reduce the number of functional units (e.g., adders, multipliers). However, given a Digital Signal Processing (DSP) design, a conventional folding algorithm spends a significant amount of time in extracting parallelism and dependencies and in optimizing computation schedules. The complexity of hardware synthesis grows super-linearly with the number of logic units involved in the computation. Thus, the larger the designs, the harder it is to optimize and transform the circuitry.
Additionally, the conventional folding algorithm has a limitation that the operations mapped to time-multiplexed shared resources cannot have internal states. That is, the resources having internal states cannot be time shared using the conventional folding algorithm.