Resonant drivers have recently been proposed for the energy-efficient distribution of signals in synchronous digital systems. For example, in the context of clock distribution networks, energy efficient operation with resonant drivers is achieved using an inductor to resonate the parasitic capacitance of the clock distribution network. Clock distribution with extremely low jitter is achieved through the elimination of buffers. Moreover, extremely low skew is achieved among the distributed clock signals through the design of relatively symmetric distribution networks. Network performance depends on operating speed and overall network inductance, resistance, size, and topology, with lower-resistance symmetric networks resulting in lower jitter, skew, and energy consumption when designed with adequate inductance.
The distribution of clock and data signals presents a particular challenge in the context of FPGAs, resulting in limited operating speeds and high energy dissipation. Typically, FPGAs deploy multiple clock networks, operating at various clock frequencies. To ensure a high degree of programmability, FPGAs typically provide the means for connecting any storage device (flip-flop) in the FPGA to any of these multiple clock networks. Moreover, all clock networks must be distributed across the entire FPGA. The resulting clock distribution networks are thus highly complex, resulting in relatively lower operating speeds. To exacerbate the situation, the large size and high complexity of these clock networks require the extensive deployment of sophisticated power management techniques such as clock gating, so that overall power consumption is kept at acceptable levels. These power management techniques result in additional design complexity, increased uncertainty in signal timing, and therefore additional limitations to operating speeds.
To maximize programming flexibility, FPGAs typically include one or more large-scale networks for distributing data across the entire device. These networks comprise multiple programmable switches to provide for selective connectivity among the logic blocks in the FPGA. They also include multiple and long interconnects that typically rely on multiple buffers (repeaters) to propagate data. The high complexity of these networks results in increased timing uncertainty in signal timing, limiting operating speeds. The extensive deployment of buffers results in increased energy dissipation. To exacerbate the situation, these networks are often pipelined to provide for higher data transfer rates, resulting in even higher complexity and energy dissipation.
In addition to FPGA devices, multiple clock networks operating at various clock frequencies are generally deployed in microprocessor, ASIC, and SOC designs to implement complex computations and achieve high performance. These clock networks are distributed across the entire device and make extensive use of power management techniques such as clock gating to keep power consumption at acceptable levels. They are therefore highly complex, and their maximum achievable performance is limited by increased timing uncertainty.
One disclosure of design methods for resonant clock networks can be found in U.S. Pat. No. 5,734,285 (“Electronic circuit utilizing resonance technique to drive clock inputs of function circuitry for saving power”). A single resonant domain is described along with methods for synthesizing harmonic clock waveforms that include the fundamental clock frequency and a small number of higher-order harmonics. It also describes clock generators that are driven at a reference frequency, forcing the entire resonant clock network to operate at that frequency. However, the methods do not address clock network architectures or scaling issues that encompass the requirements of FPGA devices. Moreover, it is not concerned with devices that include multiple clock networks operating at various clock frequencies.
Another disclosure of design methods for resonant clock networks can be found in U.S. Pat. No. 6,882,182 (“Tunable clock distribution system for reducing power dissipation”). A method is described for using inductance and capacitance to tune the frequency of a clock distribution network in a programmable logic device. This method focuses on frequency tuning and does not address any clock scaling issues that encompass the requirements of large FPGA devices. Moreover, it does not disclose any clock network architectures for FPGAs.
Resonant clock network designs for local clocking (i.e., for driving flip-flops or latches) are described and empirically evaluated in the following articles: “A 225 MHz Resonant Clocked ASIC Chip,” by Ziesler C., et al., International Symposium on Low-Power Electronic Design, August 2003; “Energy Recovery Clocking Scheme and Flip-Flops for Ultra Low-Energy Applications,” by Cooke, M., et al., International Symposium on Low-Power Electronic Design, August 2003; “Resonant Clocking Using Distributed Parasitic Capacitance,” by Drake, A., et al., Journal of Solid-State Circuits, Vol. 39, No. 9, September 2004, and “Resonant-Clock Latch-Based Design” by Sathe, V., et al., Journal of Solid-State Circuits, Vol. 43, No. 4, April 2008. The designs set forth in these papers are directed to a single resonant domain, however, and do not describe the design of large-scale chip-wide resonant clock network architectures for FPGAs or other devices with multiple clock networks and various clock frequencies.
The design and evaluation of resonant clocking for high-frequency global clock networks was addressed in “Design of Resonant Global Clock Distributions,” by Chan, S., et al., International Conference on Computer Design, October 2003, “A 4.6 GHz Resonant Global Clock Distribution Network,” by Chan, S., et al., International Solid-State Circuits Conference, February 2004, and “1.1 to 1.6 GHz Distributed Differential Oscillator Global Clock Network,” by Chan, S., et al., International Solid-State Circuits Conference, February 2005. These articles focus on global clocking, however, and do not provide any methods for designing a large-scale resonant network that distributes clock signals with high energy efficiency all the way to the individual flip-flops in an FPGA device. Moreover, they are not directed to FPGAs or other devices with multiple clock networks and various clock frequencies.
Another approach for addressing the speed limitations of current FPGA devices is the use of asynchronous logic design. In this approach, clocks are eliminated from the device, and computations are coordinated through the deployment of handshake circuitry. A design for asynchronous FPGAs is described in “Highly Pipelined Asynchronous FPGAs” by Teifel, J., et al., ACM FPGA Conference, 2004. The design and evaluation of a small-scale asynchronous FPGA prototype is described in “A High Performance Asynchronous FPGA: Test Results” by Fang, D., et al., IEEE Symposium on Field Programmable Custom Computing Machines, 2005. A significant drawback of asynchronous FPGAs is the challenge of verifying that the design meets performance requirements under worst-case conditions. FPGA tools are not tailored to perform worst-case timing analysis of a logic structure having multiple clocks. For complex asynchronous structures, checking the worst-case timing of each clock and datapath to verify that worst-case timing constraints are met is an extremely tedious or next to impossible task. Other drawbacks of asynchronous FPGAs include the difficulty in interfacing with conventional synchronous designs and the difficulty in ensuring during testing that they meet worst-case performance requirements under all operating conditions (temperature, supply voltage etc.). With regard to energy consumption, asynchronous circuitry still dissipates the CV2 energy that is required to charge and discharge a capacitive load. It therefore dissipates more energy than resonant drivers when used to drive a signal over capacitive interconnect across an FPGA device.