Electronic systems commonly contain duplicative circuitry for any number of reasons. Duplicative circuitry may be designed into an electronic system to achieve parallelism and additional throughput of data. For example, a packet router employs hundreds of identical channels to achieve the required throughput. Also, applications in multimedia, telecommunications, Digital Signal Processing (DSP), and microprocessors design naturally call for multiple copies of key circuit resources. On the other hand, in large circuit designs, flat duplication of circuit resources is often unintended and not considered carefully, leaving room for improvement.
Resource sharing is one way used to optimize electronic circuits through sharing and reuse of duplicative circuitry. Resource sharing enables electronic systems to be designed and manufactured cheaper and more efficiently by sharing the duplicative circuitry among several processes or users. In order to optimize a design using resource sharing, the duplicative circuitry must first be identified and then shared whenever possible. FIG. 1A illustrates resource sharing among modules with identical circuitry and common input/output (I/O) signals according to the prior art. FIG. 1A includes two identical circuits and/or functional modules, clone A 101 and clone B 102, having the same I/O signals, IN0 and OUT0, respectively. Since clone A 101 and clone B 102 contain duplicative circuitry and the same I/O, clone A 101 and clone B 102 are identified as candidates for sharing. Clone A 101 and clone B 102 each include duplicative circuitry that may be shared by both clone A 101 and clone B 102. This sharing of resources among duplicative circuits clone A 101 and clone B 102 is achieved by replacing clone A 101 and clone B 102 with a single shared resource 103 and appropriately routing the common I/O. The functionality of both clone A 101 and clone B 102 is maintained, but the resources required by the circuit are reduced through resource sharing. Sharing of resources can result in an overall size reduction in electronic circuitry. As a result, resource sharing has become a popular topic, and different methods of optimizing electronic systems using resource sharing have been explored.
In designing electronic circuits, transformations are frequently performed to optimize certain design goals. Transformations may be used to perform resource sharing and thereby reduce the area used by a circuit. A “folding transformation” is one of the systematic approaches to reduce the silicon area used by an integrated circuit. Such algorithmic operations can be applied to a single functional unit to reduce its resource requirements and also to multiple functional units to reduce their number. FIG. 1B illustrates resource sharing using a 2× folding transformation among candidates for sharing with same or similar functionality and/or circuitry and including different I/O signals according to the prior art. Before sharing, the two candidates clone A 101 and clone B 102 each have separate clock inputs connected to the same clock source, Ck, and different I/O (i.e., IN0 and OUT0 corresponding to clone A 101 and IN1 and OUT1 corresponding to clone B 102). Since clone A 101 and clone B 102 each contain same or similar circuitry and/or functionality, the resources utilized by each of clone A 101 and clone B 102 may be shared. A folding transformation may be performed to share resources including folding clone A 101 and clone B 102 onto a single set of common hardware resources, such as shared resource 103, and adding multiplexing circuitry to select between the I/O corresponding to clone A 101 and clone B 102, respectively. While in this example the two candidates belong to the same clock domain, resource sharing is also possible among candidates in different clock domains, e.g., in cases when only one of the candidates is going to be used at any given time.
In at least certain embodiments, the multiplexing circuitry includes multiplexing and demultiplexing circuits (such as MUX 105 and DeMUX 106 shown in FIG. 1B), and selection circuitry (such as selection circuit 109). The multiplexing circuitry is connected to the shared resources 103 in the configuration illustrated in FIG. 1B to alternatively select between the I/O of clone A 101 and the I/O of clone B 102. When the selection circuit 109 outputs a first selection value (say binary 0), this value is placed on line 131 causing the selection input 133 of MUX 105 to select input IN0 corresponding to clone A 101 to pass through MUX 105 and into the input of shared resource 103. Likewise, this value (binary 0) placed on line 131 is also received at selection input 135 of DeMUX 106 causing outputs of shared resource 103 to pass through DeMUX 106 and through the output Out 0 of DeMUX 106 corresponding to clone A 101.
Alternatively, when the selection circuitry 109 outputs a second selection value (say binary 1) onto line 131, this value causes the selection input 133 of MUX 105 to select input IN1 corresponding to clone B 102 to pass through MUX 105 and into the input of shared resource 103. Likewise, this value (binary 1) placed on line 131 is also received at selection input 135 of DeMUX 106 causing outputs of shared resource 103 to pass through DeMUX 106 and be output at OUT1 of DeMUX 106 corresponding to clone B 102. In this manner, the resources of clone A 101 and clone B 102 are shared even though clone A 101 and clone B 102 include different I/O signals. The functionality of both clone A 101 and clone B 102 is maintained using roughly a half of the original resources (minus multiplexor overhead).
U.S. Pat. No. 7,093,204 (hereinafter “the Oktem patent”) entitled “Method and Apparatus for Automated Synthesis of Multi-Channel Circuits” describes methods and apparatuses to automatically generate a time-multiplexed design of a multi-channel circuit from a single-channel circuit using a folding transformation. In Oktem, a single-channel circuit is replicated N times resulting in a multi-channel circuit containing N separate channels. Each of the N channels then becomes a candidate for sharing with identical circuitry and different I/O signals. A folding transformation is then performed to share resources among the N channels of the multi-channel circuit. However, the Oktem patent alters the functionality of the received circuit, rather than optimizing the circuit without changing its functionality. A continuation in part of the '204 patent, U.S. Pub. No. 2007-0174794 A1, extends the Oktem patent to receive a design having a plurality of instances of a logical block and automatically transform the system to a second design having a shared time-multiplexed variant of the original block. Additionally, the Oktem patent does not teach discovering previously unknown similar or identical subsets of a circuit for the purpose of resource sharing. More details about folding transformations can be found in “VLSI digital signal processing systems: design and implementation”, by Keshab K. Parhi, Wiley-Interscience, 1999. The Oktem patent contains a discussion of prior art, which we hereby include by reference.
Traditional resource sharing in integrated circuit design is further discussed in Atmakuri et al., U.S. Pat. No. 6,438,730. The Atmakuri patent determines whether two or more branches in an electronic circuit drive a common output in response to a common select signal. If so, a determination is made whether the decision construct includes a common arithmetic operation in the branches so that the design may be optimized. Resource sharing is also considered in high-level synthesis, along with scheduling, where it is common to share arithmetic operations used to perform multiple functions.
Additionally, many previous resource sharing solutions are limited to specific cases. For example, some previous solutions implement shared modules in a very different form compared to the original modules, e.g., hardware implementation of frequently occurring software-program fragments, or transformation of an initial netlist into a netlist that performs another function. U.S. Pat. No. 5,596,576 to Milito entitled “Systems and Methods for Sharing of Resources” addresses dynamically assigning resources to users and charging users at different rates. The concept of resource sharing in some patents refers to communication channels or wireless spectrum, e.g., U.S. Pat. No. 4,495,619 to Acampora entitled “Transmitter and Receivers Using Resource Sharing and Coding for Increased Capacity.” Another category, represented by the U.S. Pat. No. 7,047,344 to Lou et al. entitled “Resource Sharing Apparatus” deals with sharing peripheral devices of personal computers, connected through a bus, e.g., printers, keyboards and mice.
U.S. Pat. No. 6,779,158 to Whitaker et al. (hereinafter “the Whitaker patent”) entitled “Digital Logic Optimization Using Selection Operators” describes a transformation of an ASIC-style netlist that optimizes design objectives such as area by transistor and standard-cell level resource sharing, and through the use of standard cells enriched with selection, which is essentially multiplexing. Much consideration is given to the layout of these standard cells. However, the conventional wisdom in the field is that most significant sharing is observed before mapping to ASIC-style gates. While the Whitaker patent mentions possibly considering higher levels of abstraction where a module would include a plurality of cells, it does not offer solutions that can be applied before mapping to cells occurs. Additionally, given that FPGAs are not designed with ASIC-style cell libraries described in the Whitaker patent, the patent does not apply to FPGAs.
Time-multiplexed resource sharing has been used in the electronic circuitry. For example, Peripheral and Control Processors (PACPs) of the CDC 6600 computer, described by J. E. Thornton in “Parallel Operations in the Control Data 6600”, AFIPS Proceedings FJCC, Part 2, Vol. 26, 1964, pp. 33 40, share execution hardware by gaining access to common resources in a round-robin fashion. Further, “Time-Multiplexed Multiple-Constant Multiplication” by Tummeltshammer, Hoe and Püschel, published in IEEE Trans. on CAD 26(9) September 2007, discusses resource time-sharing among single-constant multiplications to reduce circuit size in Digital Signal Processing (DSP) applications. However, its techniques are limited to multiple-constant multiplication.
U.S. Pat. No. 6,735,712 to Maiyuran et al. (hereinafter “the Maiyuran patent”) entitled “Dynamically Configurable Clocking Scheme for Demand Based Resource Sharing with Multiple Clock Crossing Domains” describes resource-sharing between or among two or more modules driven at different clock frequencies. The Maiyuran patent is limited to using three clocks and discloses how one module can temporarily use a fraction of resources from the other module. The Maiyuran patent selectively applies a clock signal that has the frequency of the first or second clock. Such a dynamically configurable clocking scheme may be difficult to implement and may result in a limited applicability, whereas fixed-frequency clock signals are more practical.
U.S. Pat. No. 6,401,176 to Fadavi-Ardekani et al., entitled “Multiple Agent Use of a Multi-Ported Shared Memory” assumes an arbiter and a super-agent that uses the shared memory more frequently than other agents. The super-agent is offered priority access, limiting agents to “open windows.” “Post-placement C-slow Retiming for the Xilinx Virtex FPGA,” by N. Weaveret et al., presented at the FPGA Symposium 2003, describes a semi-manual FPGA flow that receives a circuit design and creates a multi-threaded version of this design, using the duplication of all flip-flops followed by retiming. However, this methodology alters the functionality of the design or logic block. An equivalent technology was commercialized by Mplicity, Inc, which announced the gate-level Hannibal tool and the RTL Genghis-Khan tool. The Hannibal tool transforms a single logic block into an enhanced Virtual-Multi-Logic-Block. Genghis automatically transforms a single logic block RTL into a Virtual-Multi-Logic-Block RTL, while Khan performs automatic gate level optimization. The process invocation switch can be set to 2×, 3× or 4×. Mplicity materials disclose applications to multi-core CPUs. The handling of clocks is disclosed for single clock domains. Mplicity materials also disclose several block-based techniques for verifying multi-threaded blocks created using their tools. However, the Mplicity materials do not disclose sharing blocks with different functionality or automatic selection of single or multiple blocks for multithreading.
The publication, “Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks,” presented at FCCM 2006, A. DeHon et al., compares packet-switching networks and the virtualization (time-multiplexing) of FPGA interconnects for sparse computations in Butterfly Fat Trees. However, this work does not disclose clocking or using more than one clock domain.