As is well known in the art, a field programmable gate array (FPGA) is a class of programmable logic device (PLD) semiconductor devices containing programmable logic components, also known as configurable logic blocks (CLBs), and programmable interconnects, also known as input/output blocks (IOBs). Through configuration of the interconnects, the programmable logic components can be programmed to duplicate the functionality of basic logic gates such as AND, OR, XOR, NOT or more complex combinational functions such as decoders or simple math functions. In most FPGAs, these programmable logic components (or logic blocks, in FPGA parlance) also include memory elements, which may be simple flip-flops or more complete blocks of memories.
As noted above, the hierarchy of programmable interconnects allows the logic blocks of an FPGA to be interconnected as needed by the system designer, somewhat like a one-chip programmable breadboard. These logic blocks and interconnects can be programmed after the manufacturing process by the customer/designer (hence the term “field programmable”) so that the FPGA can perform whatever logical function is needed.
FPGAs have several significant advantages over conventional application-specific integrated circuits (ASICs), including a shorter time to market, ability to re-program in the field to fix bugs, and lower non-recurring engineering costs. Applications of FPGAs include digital signal processing (DSP), software-defined radio, aerospace and defense systems, ASIC prototyping, medical imaging, computer vision, speech recognition, cryptography, bioinformatics, computer hardware emulation and a growing range of other areas.
To define the behavior of an FPGA a user provides a hardware description language (HDL) or a schematic design. In electronics, a hardware description language or HDL is a language from a class of computer languages used to develop formal descriptions of electronic circuits. A typical HDL can describe the circuit's operation, its design, and tests to verify its operation by means of simulation.
An HDL's syntax and semantics include explicit notations for expressing time and concurrency, which are the primary attributes of hardware. Languages whose principal characteristic is to express circuit connectivity between a hierarchy of blocks are classified as netlist languages, and some HDLs can also be used for this purpose. One use of an HDL involves designing programmable logic devices, such as FPGAs. The two most widely-used and well-supported HDL varieties used in industry today are VHDL and Verilog. VHDL, or VHSIC Hardware Description Language, is commonly used as a design-entry language for field-programmable gate arrays and application-specific integrated circuits in electronic design automation of digital circuits. Verilog is a hardware description language (HDL) used to model electronic systems. The Verilog language (sometimes called Verilog HDL) supports the design, verification, and implementation of analog, digital, and mixed-signal circuits at various levels of abstraction.
Essential to HDL design is the ability to simulate HDL programs. An HDL program may be tested in hardware, such as by uploading it into a programmable logic device or even by producing a chip based on its specification. However, this is generally a very time-consuming and costly process, and generally the bulk of testing and debugging is done using a program called a simulator. The simulator maintains a re-settable “clock”, similar to the real clock of a digital device, and allows the designer to print out the values of various registers over time in order to verify and debug the design.
Circuits operate in two fundamental frequency modes, synchronous and asynchronous. A synchronous circuit is a digital circuit in which the various circuit components are synchronized by a centrally generated clock signal. In an ideal synchronous circuit, every change in the logical levels of each storage component is simultaneous. These transitions follow the level change of the clock. Ideally, the input to each storage element has reached its final value before the next clock occurs, so the behavior of the whole circuit can be accurately predicted. Practically, some delay is required for each logical operation, resulting in a maximum speed at which each synchronous system can run. To make these circuits work correctly, a great deal of care is needed in the design of the clock distribution networks. Static timing analysis is often used to determine the maximum safe operating speed.
Synchronous circuits are simulated using synchronous simulation algorithms. These algorithms use centralized-timed to follow the path of events in the circuits. In this manner simulation does not advance until all the events that occurred on the current simulation time are processed. To implement these algorithms, events are stored in a global ordered queue. Each slot in this queue represents simulation time and stores a linked list of events that occur at that simulation time.
An asynchronous circuit is a circuit in which the circuit components operate largely autonomously. The circuit components are not governed by a clock circuit or global clock signal, but instead operate based upon signals that indicate completion of previous instructions and operations. These signals are specified by simple data transfer protocols. This digital logic design is contrasted with the above-described synchronous circuits which operate according to clock timing signals.
As the events of a current time slot are processed for an asynchronous circuit simulation, the output of those events is compared the previous output of corresponding logic elements and; if they differ, new events are generated on logic elements whose input is driven by the output of current event. There is no global centralized time. Instead each data item, or token, carries a time stamp which is indicative of time up to which the data is valid. The evaluation of an event depends on the availability of a token. An asynchronous simulation algorithm can process events that occur at different time instances. Hence it can extract more parallelism compared to synchronous simulation algorithms.
One key component of asynchronous simulation algorithms is determining how to decide the time stamp of a data element. There are different conservative and optimistic approaches. In conservative schemes only safe evaluation times are allowed, that is, evaluation times which guarantee a correct result. A logic element is evaluated only after it receives all its valid input tokens. As a logic element is evaluated its output is decided on the basis of its inputs and time stamp of output is decided by time stamp of the last arriving token and the delay of the logic element. In contrast, an optimistic evaluation of a logic element takes place as soon as an input token arrives at its input. If the output produced turns out to be incorrect, then a roll back takes place to return to a previous know correct state, and messages are sent to forward elements to cancel the effect of the incorrect message sent earlier. This optimistic algorithm has an added cost of state saving and more complex control mechanism to accommodate rollback. The optimistic scheme is generally more efficient as long as rollbacks are few.
A synchronous circuit can also be used to emulate an asynchronous circuit. Such synchronous “handshake” circuits follow the same communication protocols as asynchronous circuits, but continue to use a clock signal for sequencing operations. Examples of such implementations include, for example, “Synchronous Handshake Circuits” by Ad Peeters and Kees van Berkel. Proc. 7th International Symposium on Asynchronous Circuits and Systems, March 2001. These circuits use signals to indicate when a result has been computed by a logic element. This signal (sometimes called a “valid bit”) is used in conjunction with the clock. In such a circuit, tokens are explicitly represented using these additional signals. Operations in such a circuit proceed when their inputs are valid, like in a conventional asynchronous circuit. However, clock signals are used to control state transitions as well. The clock frequency of such circuits does not necessarily determine the performance, because not every clock cycle will result in a valid result being computed. However, because clocks are used to control circuit operation, a synchronous simulation method can be used to simulate such circuits.
When a synchronous circuit implementation is used to emulate asynchronous operation, the underlying computation model is still asynchronous because the performance and other properties of the computation are determined by the rate at which tokens are processed. We refer to this underlying computation model as “asynchronous dataflow.”
Once the design is completed and verified by simulation, the HDL code is fed into a logic compiler, and the output is uploaded into the FPGA device. This is accomplished through the generation of a technology-mapped netlist. The netlist is fitted to the actual FPGA architecture using a process called place-and-route, usually performed by an FPGA company's proprietary place-and-route software. The above-described simulation may be performed after this netlist generation to validate manufacturer-specific implementations. Once the design and validation process is complete, the binary file generated (also using the FPGA company's proprietary software) is used to (re)configure the FPGA.
As noted above, asynchronous operation provide some significant advantages over synchronous operation. However, in the historical course of development, synchronous circuits were the first to be widely accepted in the industry, particularly in the field of reconfigurable gate arrays. For this reason many more designs currently exist in synchronous logic format than in asynchronous format. One of the significant challenges faced by the industry has been to determine effective ways to convert synchronous circuit designs to asynchronous designs, whereby to take advantage of the benefits of asynchronous operation, without major redesigns or, even worse, conversion design flaws.
There are a variety of synchronous reconfigurable architectures that have been developed by both research groups and companies. Most of these architectures, however, suffer from a performance problem due to the poor scaling of their interconnect.
The present inventor has recognized that the interconnect structure of asynchronous circuits, what is known in the art as the chip ‘fabric,’ can be a very limiting element in the conversion of synchronous circuits to asynchronous circuits. This is particularly true because of the complexity of the interconnect architecture required to support the asynchronous token verification protocols, the ‘handshake’ that indicates valid data as between asynchronous logic blocks. The present inventor has recognized the need for improved interconnect methods and systems for supporting converted synchronous circuits in their asynchronous form.