Reprogrammable FPGAs have been available commercially for several years. The best known commercial family of FPGAs are those from Xilinx Inc. One class of these devices uses Static Random Access Memory (SRAM) to hold control bits which control their configurations. Most FPGA devices replace traditional mask programmed Applications Specific Integrated Circuit (ASIC) parts which have a fixed configuration. The configuration of the FPGA is static and is loaded from a non-volatile memory when power is applied to the system. Nearly all commercially available FPGAs have a stream-based interface to the control store. (The control store contains the set of bits which determine what functions the FPGA will implement.) In a stream-based interface to the control store, a sequence of data is applied to a port in the FPGA to provide a complete configuration for the whole device or for a fixed (normally large) sub-section of the FPGA. This stream-based interface, when combined with an address counter which is implemented on the FPGA itself, provides an efficient method of loading the complete device configuration from adjacent EPROM or other non-volatile memory on power up without any additional overhead circuits. A stream based interface with an address counter is a suitable programing interface for an FPGA which is used as a replacement for a standard ASIC. Some FPGAs can be partly or totally reconfigured using one of a set of static configurations stored at different addresses in an EPROM, and can trigger the reconfiguration from within the design implemented on the FPGA.
Published International Application WO 90/11648, corresponding to U.S. Pat. No. 5,243,238, discloses an architecture hereafter referred to as CAL I, which has been implemented in an Algotronix product designated CAL 1024. CAL I is different from other commercially available FPGAs in that its control store appears as a standard SRAM to the systems designer, and can be accessed using address bus, data bus, chip enable, chip select and read/write signals. Addressing the control store as an SRAM supports a user program running on the host processor mapping the FPGA control store (configuration memory) into the memory or address space of the host processor so that the processor can configure the FPGA to implement a user-defined circuit. This arrangement, which is implemented in the CAL 1024 FPGA, allows the user to partition an application between the processor and the FPGA with appropriate sections being implemented by each device. The control store interface provides an important input/output (I/O) channel between the FPGA and the processor, although the I/O can also take place using more traditional techniques via, for example, a shared data memory area. This latter type of FPGA provides a passive control store interface because an external agent is required to initiate configuration or reconfiguration of the device, as required.
Experience with the CAL I architecture and trends within the electronics industry have made this second passive form of control store interface increasingly attractive for many applications. Microprocessors or microcontrollers are now pervasive components of computer systems and most board level systems contain one. The major benefit of the stream based "active" FPGA programming approach is that no overhead circuits are required to initiate reconfiguration. In systems where a microprocessor or microcontroller is present, the "passive" RAM emulating FPGA interface is preferable for several reasons:
(1) the FPGA configuration can be stored in the microprocessor's program and data memory (reducing the number of parts by removing the need for a separate memory chip), PA1 (2) the existing data and address buses on the board can be used to control the FPGA (saving printed circuit board area by removing dedicated wires between the configuration EPROM and the FPGA); PA1 (3) the FPGA control store can be written to and read from by the microprocessor, and thereby used as an I/O channel between the FPGA and the microprocessor, thereby potentially saving additional wiring between the FPGA and the processor buses and freeing the FPGA programmable I/O pins for communication with external devices, and PA1 (4) the intelligence of the microprocessor can be used to support compression schemes for the configuration data and other techniques, which allows more flexibility in reprogramming the FPGA. PA1 (1) Application swapping occurs when one application terminates and a completely different application wishes to make use of the FPGA. In this case the FPGA chip is completely reconfigured, usually from a static configuration. PA1 (2) Task swapping occurs when the application must configure relatively large sections of the FPGA to implement a new phase in the computation. For example, a sorting application might first sort small batches of data completely using configuration A and then merge those sorts into a completely sorted stream of data using configuration B. In this case, the application has knowledge of both configurations and need only change those resources which are different in configuration B. At a later point, configuration A may itself be restored. PA1 (3) Data dependent reconfiguration occurs when the configurations of some cells are computed dynamically based on input data by the application program, rather than being loaded from a static configuration file. Often a static configuration is first loaded, then a relatively small sub-set of cells are reconfigured dynamically (that is, reconfigured while the chip is operating). An important example of this class of reconfiguration is where an operand (such as a constant multiplier or a search string) is folded directly into the logic used to implement the multiply or sort unit rather than being stored in a register. This technique is advantageous as it frequently results in smaller and faster operation units. PA1 (4) Access to gate outputs occurs for debugging. The outputs of all the logic cells on the CAL I FPGA are mapped to bits of the control store. Debugging programs are available which read back this information on the display or design layout to show the logic levels on internal wires. PA1 (5) Access to gate outputs for I/O is similar to the previous access to gate outputs for debugging. But in this particular case only a small fraction of the logic nodes, namely those which correspond to input and output registers, will be accessed repeatedly. The ability to rapidly assemble a word representing input to or the result of a computation from several bits at different locations in the control store is critical to the effectiveness of this technique. PA1 detecting control store bit patterns which correspond to routing a signal straight through a cell, detecting when a group of cells beneath a flyover all route the signal in the direction of the flyover by using the 4-input gate provided for that flyover direction, and taking as input the output of the 4-input gate of the appropriate neighbour multiplexer, PA1 feeding an output from one of the 4 input gates to switches at both ends of the flyover, whereby the signal is carried automatically by the flyover as well as by neighbour routing, and the faster signal on the flyover is selected by the switch at the end of the flyover.
In addition, the difference in cost between an "active" FPGA with an associated EPROM holding its configuration and a passive FPGA with an active microcontroller chip containing an EPROM and a simple processor is minimal. The easy reprogrammability makes the passive FPGA attractive, even if the microcontroller has no other function apart from reprogramming the FPGA.
Another trend within the Electronics Industry has been the provision of "support chips" for microprocessors which provide an interface between I/O devices and a particular microprocessor. Examples of these devices include Universal Asynchronous Receiver Transmitters (UARTs) for low bandwidth serial I/O, Programmable Peripheral Interfaces (PPIs) for low bandwidth parallel I/O and various specialised chips for higher bandwidth connections to networks and disk drives. These support chips appear to the processor as locations in the I/O or memory address space to and from which data are transferred. Some support chips can interrupt the processor via interrupt lines or take over the bus for Direct Memory Access (DMA) operations. In many ways a passive FPGA chip can be viewed as a successor to a support chip, providing an interface to the processor via its control store on the one hand, and an interface to the external world via a large number of flexible I/O lines on the other, for example 128 programmable I/O lines on the Algotronix CAL 1024 device.
A passive FPGA chip has a number of advantages. For example, it is cost-effective to provide a single FPGA with a library of configurations instead of providing a number of support chips. In addition, providing a single FPGA for several functions reduces the number of devices in the processor manufacturer's catalogue. Also, reconfigurable FPGAs can support changeable I/O functions, such as when a single external connector can be used as either a serial or a parallel port. With a passive RAM control interface, the FPGA is able to support other functions as well.
Each time an FPGA is reconfigured to implement a different set of functions, the microprocessor must access the configuration memory. One reconfiguration typically requires many control store accesses, one access for each word of configuration memory to be changed. Several important classes of reconfiguration have been identified.
It is desirable to reduce the number of accesses required and hence the time to wholly or partially reconfigure the device. Several systems other than CAL I have been proposed which allow direct access to internal signals in an FPGA or an FPGA-like device, for example, as disclosed in Cellular Logic-in-Memory Arrays, William H. Kautz, IEEE Transactions on Computers Vol C18 No. 8, August 1969; A Logic in Memory Computer, Harold S. Stone, IEEE Transactions on Computers, Vol C19 No. 1, January 1970 and Xilinx U.S. Pat. No. 4,758,985 Microprocessor Oriented Configurable Logic Element, although all these proposals suffered from major drawbacks and were not made available commercially.
It is also desirable to improve the means of accessing state information in designs implemented on FPGAs so that an external processor can perform word-wide read or write operations on the registers of the user's design with a single access to the control store. Thus the control store interface allows high bandwidth communication between the processor and the FPGA. It is also desirable to provide mechanisms for synchronising computations between the FPGA and the processor and to provide a mechanism for extending design configuration files to support dynamic reconfiguration while allowing use of conventional tools for static designs to create FPGA configurations.
The architecture of the CAL 1024 was based on 1.5 micrometer technology available in 1989. One problem with the CAL I architecture in which cells are connected only to their nearest neighbours was that cells in the middle of the array became less useful with increasing array size as the distance and hence delay to the edge of the chip increased. This problem became more serious as improvements in processing technology meant that the number of cells implementable per chip increased from 1024 to about 16,384. This resulted in a scalability problem because of increased delays, and reduced the performance below the desired criteria. Thus, although scalability of chips using the CAL I architecture can be achieved, it is at the expense of performance. The limited number of cells available on a single chip with 1.5 .mu.m technology meant that it was desirable to ensure scalability over chip boundaries so that large designs typical of many computational applications could be realised using multiple chips. The limitations of the then processing technology also made it essential to optimise the architecture for silicon area and sometimes this optimisation was at the expense of speed. The original Algotronix CAL 1024 chips were designed to bring out peripheral array signals to pads on the edges of the cellular array so that they could be cascaded into larger cellular arrays on a printed circuit board. Packaging technology has not evolved as rapidly as chip technology and limitations on the number of package I/O pins make it uneconomic to produce fully cascadable versions of the higher cell density chips.
The CAL I architecture suffered from a number of other disadvantages. For example, in order to access a cell in the existing CAL I FPGA, five to six processor instructions are needed to calculate the address of the cell; this again takes time and slows operation. With the existing CAL I cell array the routing architecture used meant that with increased number of cells per chip, routing via intermediate cells added considerably to the delays involved. In addition, in the CAL 1024 device, global signals are coupled to all the cells in the array so that the cells can be signalled simultaneously. It logically follows that at high clock frequencies, global signals could consume high power.