Today's integrated circuits are built assuming that every copy of an IC design is identical. This is true both of conventional fixed circuits (e.g. ASICs, Custom circuit designs) and of designs mapped using conventional design approaches to reconfigurable circuits (e.g. FPGAs, CPLDs, coarse-grained reconfigurable devices).
Once fabricated, complete ICs are tested for speed performance and they are binned based on the highest operating speed the IC can sustain across all test. Further, reconfigurable ICs are binned according to the speed they can obtain across all possible designs mapped to the part.
When feature sizes are measured in thousands of silicon atom lattice spacings (0.5 nm) or multiples of the visible light wavelengths (400-700 nm), atomic-scale edge-roughness and variation has little impact on overall device characteristics. Further, when the dopants per device are measured in the millions of dopants, law of large numbers effects guaranteed that the device-to-device variation in dopant concentration is only a tiny percentage of the total dopant level. As device feature size shrinks, there is no longer the luxury of operating at scales that are several orders of magnitude above the scale of individual atoms and dopants. As a result, variation in device size, shape, dopant count, and dopant placement will manifest as significant variations in device characteristics.
Traditionally, VLSI devices with identical fabrication geometry are thought to be identical when placed on a die. Consequently, fabricated large integrated circuits (ICs) that have millions to billions of devices are carefully optimized to reduce the critical paths, perhaps at the expense of making most paths near-critical. Traditional process variations make the devices on an IC slower, but it does so in a consistent way so that the a priori assignment of logical gates and functions to devices still extracts the best performance possible from the fabricated IC.
However, at the atomic scale, devices that nominally have the same fabrication geometry will end up with distinct fabricated geometries and hence distinct characteristics, most of which cannot be known until after the device has been fabricated. As parameter variance increases, traditional techniques using fixed circuits, or even fixed assignment of functions to circuits on configurable devices, will see a decrease in the yielded speed of the IC. The cycle time of the device is determined by the slowest devices that end up on the near critical paths. When ICs have billions of devices and millions of near critical paths, there are ample opportunities for the near critical paths to sample from the statistically slow paths on the IC.
These problems are further exacerbated by the increased susceptibility of small devices to lifetime changes. It follows, over the operational lifetime of the IC, device characteristics will vary. Many of these effects cause individual devices to become slower (e.g. hot carrier, Negative Bias Temperature Instability (NBTI), electromigration). Parameters will also change over the lifetime of devices. Individual atomic bonds may break or metal may migrate increasing the resistance of a device or wire.
Looking beyond lithographic fabrication, techniques are being proposed to build post-fabrication reconfigurable circuits using nanowires and molecular-scale switches. For example, FIGS. 1 and 2 depict an architecture suitable for bottom-up construction from nanowires as described by Andre' DeHon in “Nanowire-Based Programmable Architectures,” JETC, vol. 1, no. 2, pages 109-162, incorporated herein by reference in its entirety. Other nanoscale designs include the nanoFabrics described by Seth Copen Goldstein and Mihai Budiu in “NanoFabrics: Spatial Computing Using Molecular Electronics,” ISCA, pages 178-189, date June of 2001, incorporated herein by reference in its entirety; CMOL described by Dmitri B. Strukov and Konstantin K. Likahrev in “A Reconfigurable Architecture for Hybrid CMOS/Nanodevice Circuits,” ISFPGA, pages 131-140, year 2006, incorporated herein by reference in its entirety, and described in “A Reconfigurable Architecture for Hybrid Digital Circuits with Two-Terminal Nanodevices,” Nanotechnology, vol. 16, no. 6, pages 888-900, date June of 2005, incorporated herein by reference in its entirety; and the crossbar architectures described by Greg Snider and Philip Kuekes and R. Stanley Williams in “CMOS-like Logic in Defective, Nanoscale Crossbars,” Nanotechnology, vol. 15, pages 881-891, date June of 2004, incorporated herein by reference in its entirety, and described by Yi Luo and Patrick Collier and Jan O. Jeppesen and Kent A Nielsen and Erica Delonno and Greg Ho and Julie Perkins and Hsian-Rong Tseng and Tohru Yamamoto and J. Fraser Stoddart and James R. Heath in “Two-Dimensional Molecular Electronics Circuits,” Chem Phys Chem, vol. 3, no. 6, pages 519-525, year 2002, incorporated herein by reference in its entirety. However, some believe the high variation in parameters in these nanoscale devices might make it impractical to exploit them as opined by Victor A. Sverdlov and Thomas J. Walls and Konstantin K. Likharev in “Nanoscale Silicon MOSFETs: A Theoretical Study,” IEEE Transactions on Electron Devices, vol. 50, no. 9, pages 1926-1933, date September of 2003, incorporated herein by reference in its entirety.
At the same time, several techniques known in the literature provide a way to measure the delays or parameters of fabricated devices. For example, ring oscillators built from the resources on an FPGA provide one way to measure the delay of both regions and individual resources. Ring oscillators built out of nominally identical resources and placed on different locations of the chip can be used to determine the relative performance of each region of the chip. Individual resources (e.g. nominally identical wire tracks in a channel or LUTs in a Cluster) can be substituted within a ring oscillator to measure the relative delay impact of individual, substitutable resources.
For synchronous and precharge designs for example, a timing experiment may be setup to double dock the design at a reference frequency. This determines if a particular logic event can be successfully completed in a particular time window. By sweeping the timing of the reference frequency, the delay of the configuration may be identified. By configuring different active resources into the sample path, the delay supported by each resource or resource set may be mapped out. The Razor latch design uses a shadow latch to sample the signal a set period after the operating clock to detect late arriving inputs; such a design can be used both for characterization and for continuous monitoring during operation. This is described in more detail by Todd Austin and David Blaauw and Trevor Mudge and Kriszti'an Flautner in “Making Typical Silicon Matter with Razor,” IEEE Computer, vol. 37, no. 3, pages 57-65, date March of 2004, incorporated herein by reference in its entirety.
Additionally, delay measurements may be preformed by configuring a portion of the IC to contain a test circuit that has one or more of the physical resources to be tested preceded and followed by registers in the IC. An input vector that will force a change to propagate through the resource(s) under test is applied. The input register is then docked so the change can begin propagating and the result is clocked into the output register. The clock on the input and output register can be the same or independent. In any case, by varying the delay between the input clock and the output clock, it is possible to determine the speed of the resources configured between the registers. If the output sees the correct value for the test, then there was enough time for the change to propagate through the resources. However, if the output sees the incorrect value, there was not enough time. Consequently, by adjusting the delay (perhaps the clock period) it is possible to determine the speed of the resources under test.
To measure the relative delay of regions, sample registers on a common clock can, for example, be configured along a chain of logic; faster regions will allow further propagation within the chain in a fixed cycle, while slower regions will propagate changes a shorter distance in the same fixed cycle.
FPGAs and CPLDs are often considered “fine-grained” because they use very fine-grained building blocks (small Lookup Tables (LUTs), small gates, primitive AND terms (pterms)). They also work on single-bit inputs and produce single-bit outputs. A term “coarse-grained configurable device” is often used to refer to configurable devices which use larger building blocks, often with multi-bit inputs and multi-bit outputs. Example include: MATRIX, PADDI, RaPiD, PipeRench. See for example, Ethan Mirsky and Andre' DeHon “MATRIX: A Reconfigurable Computing Device with Configurable Instruction Distribution and Deployable Resources,” Hot Chips Symposium 1997, incorporated herein by reference in its entirety; Ethan Mirsky and Andre' DeHon “MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources,” ISFCCM, April of 1996, incorporated herein by reference in its entirety; Dev C. Chen and Jan M. Rabaey “A Reconfigurable Multiprocessor IC for Rapid Prototyping of Algorithmic-Specific High-Speed DSP Data Paths,” IEEE Journal of Solid-State Circuits, vol. 27, no. 12, pages 1895-1904, December of 1992, incorporated herein by reference in its entirety; Alfred K. Yeung and Jan M. Rabaey “A 2.4-GOPS Data-Drivern Reconfigurable Multiprocessor IC for DSP,” Proceedings of the 1995 IEEE International Solid-State Circuits Conference, pages 108-109, February of 1995, incorporated herein by reference in its entirety; Carl Ebeling, Darren Cronquist and Paul Franklin “RaPiD—Reconfigurable Pipelined Datapath,” FPL, no. 1142, pages 126-135, September of 1996, incorporated herein by reference in its entirety; and Seth C. Goldstein, Herman Schmit, Matthew Moe, Mihai Budiu, Srihari Cadambi, R. Reed Taylor and Ronald Laufer “PipeRench: a Coprocessor for Streaming Multimedia Acceleration,” ISCA, pages 28-39, May of 1999, incorporated herein by reference in its entirety.
Asynchronous Circuits
Asynchronous circuits are those which do not use a clock. They are naturally delay independent. As such, they can maintain correct operation even when devices or slow or devices slow down due to aging. However, there is no guarantee on the timing between events in an asynchronous circuit, and variation and aging can result in slow operation of the asynchronous device.
Asynchronous FPGAs are known in the art:    An Architecture for Asynchronous FPGAs. Catherine G. Wong, Alain J. Martin, and Peter Thomas. Proc. IEEE International Conference on Field-Programmable Technology (FPT), December 2003, incorporated herein by reference in its entirety.    John Teifel and Rajit Manohar. An Asynchronous Dataflow FPGA Architecture. IEEE Transactions on Computers (special issue), November 2004, incorporated herein by reference in its entirety.    John Teifel and Rajit Manohar. Highly Pipelined Asynchronous FPGAs. 12th ACM International Symposium on Field-Programmable Gate Arrays, Monterey, Calif., February 2004, incorporated herein by reference in its entirety.
They key observation here is that the delay around a handshaking loop (request, action, acknowledge) is effected by the delay of the individual devices in the loop. Further, the throughput of an asynchronous pipeline or larger asynchronous cycle of dependencies will be determined by the slowest such handshake loop delay. In this manner, the slowest asynchronous handshake loop serves the same role as the slowest path in a synchronous circuit, limiting the performance of the entire circuit. The correspondence is not exact because the synchronous circuit demands that the clock cycle accommodate all possible data-dependent delays, whereas the asynchronous circuit can run as fast as a particular input data allows. Nonetheless, the broad phenomenon still applies.
A common asynchronous circuit is an arbiter. An Exemplary arbiter is disclosed in “A Delay-insensitive Fair Arbiter” by Alain J. Martin. June 1985, incorporated herein by reference in its entirety.
Unfortunately the existing techniques do not address fabrication or lifetime variation of devices. For example, existing techniques use mappings that do not account for the speed of individual devices; integrated circuits (ICs) build according to the existing design styles are crippled by the slowest devices, and existing designs techniques deliberately run devices slower than potential capacity to accommodate slowdown over device lifetime. Therefore, there is a need for a better approach to avoid the detrimental effect of device variation as VLSI feature sizes continue to shrink toward the atomic scale.