Integrated circuits (ICs) having programmable logic, for example, field programmable gate arrays (FPGAs), are popular hardware devices for quickly implementing circuit designs. An FPGA typically includes an array of configurable logic blocks (CLBs) surrounded by a ring of programmable input/output blocks (IOBs). The CLBs and IOBs are interconnected by a programmable interconnect structure. The CLBs, IOBs, and interconnect structure are typically programmed by loading a stream of configuration data into internal configuration memory cells that define how the CLBs, IOBs, and interconnect structure are configured. The configuration data may be read from an external memory, conventionally an external integrated circuit memory EEPROM, EPROM, PROM, and the like.
In order to improve the usefulness of the FPGA, a processor core, such as the PowerPC® processor of IBM Corp. of Armonk, N.Y., was embedded in an FPGA, for example, the Virtex-II™ Pro FPGA from Xilinx, Inc. of San Jose, Calif.
FIG. 1 illustrates a generic prior art diagram of an FPGA having an embedded processor. The FPGA 10 includes a programmable logic fabric 14 having the CLBs and interconnect structure, and an I/O ring 16 having the IOBs. The IOBs are fabricated on a substrate supporting the FPGA 10 and are coupled to the pins of the integrated circuit, allowing users access to the programmable logic fabric 14 and the processor core 12. The processor core 12, includes a central processing unit (CPU) 8 connected to one or more cache memories 9.
FIG. 2 is conventional bus architecture for an FPGA 10 having the processor core 12. An example is IBM's CoreConnect™ Bus Architecture. Most of the components and buses shown in FIG. 2 are implemented in the programmable logic fabric 14 of FIG. 1. The processor core 12 communicates with the On-Chip Memory (OCM) 110, via an On-Chip Memory (OCM) bus 112. The OCM 110 includes one or more of the FPGA's Block Random Access Memory (BRAM) modules (not shown). There are three major buses, i.e., bus 114, bus 115, and bus 116 that allow processor core 12 to communicate with other components or devices. A bus may have a bus arbiter, which controls access to the bus, e.g., PLB ARB 122 for bus 114 and OPB ARB 127 for bus 116.
Bus 114, also called a processor local bus (PLB) 114, connects processor core 12 to high-speed devices/components 120. These high-speed devices/components 120 could include memory, FSMS, and other high performance peripherals. A device/component that takes control of PLB 114 to handle its own transfer is called a “master”, whereas a device/component that receives commands from the master to send data is called a “slave”.
Bus 116, also called an on-chip peripheral bus (OPB) 116, provides processor core 12 access to low speed devices/components 125. These low speed devices/components 125 could include UARTs and Ethernet connections. Note that low speed devices/components 125, like high speed devices/components 120, can include both masters and slaves. However, to prevent these low speed devices/components 125 from affecting the performance of processor core 12, OPB 116 is not connected directly to processor core 12. Instead, OPB 116 is coupled to PLB 114 via an OPB bridge 118. OPB bridge 118 can automatically convert data formats and protocols, thereby facilitating the transfer of information between OPB 116 and PLB 122.
Bus 115, also called a Device Control Register (DCR) bus 115, allows the processor core 12 relatively low speed communications in order to manage status and configuration registers, e.g., Device Control Registers, on the other devices/components. DCR bus 115 connects, via a daisy chain arrangement, the processor core 12 (master) to the OCM 110 (slave), high-speed devices/components 120 (slaves), and low-speed devices/components 125 (slaves).
A conventional component implemented in an FPGA without an embedded processor is an FSM. Particular FSMs may contain a large number of states, and may involve much computation to determine the next state and the state outputs based on varying inputs. However, these FSMs may actually have relatively relaxed timing constraints compared to the rest of the system, e.g., the other components implemented in the programmable logic fabric, which suggests that the FSM may be implemented in software rather than in hardware. Hence for an FPGA with an embedded processor, having the processor implement part or all of the FSM would free up the associated programmable logic fabric resources.
An example of implementing an FSM in hardware and software is the Berkeley POLIS system. POLIS is a complete co-design solution, which uses the co-design finite state machine (CFSM) as the central representation of the required system behavior. The single CFSM can be partitioned into multiple software or hardware sub-networks. A hardware CFSM sub-network is constructed using standard logic synthesis techniques, and can execute a transition in a single clock cycle.
A software CFSM sub-network is transformed into a software program and a simple custom real time operating system. The program is generated from a control/data flow graph, and is coded in C. In order to get accurate timing information, such as the time duration for each state and each state transition, the C code must be instrumented and the code executed on the processor. The instrumented version counts the actual processor cycles used, hence giving an accurate way of extracting timing information.
The interfaces between the hardware and software sub-networks are automatically synthesized in POLIS and come in the form of cooperating circuits and software procedures (I/O drivers) embedded in the synthesized implementation. Communication to the I/O drivers can be through specific I/O ports available on the processor, or via general memory mapped I/O.
One of the significant disadvantages of the CFSM approach is that the code must be instrumented (each line of C code has appended to it instructions to count clock cycles associated with executing the line of code), and the code must actually be executed before accurate timing data is known. It would be much more efficient if timing data could be determined from an examination of the code, before execution of the code.
Another disadvantage of the CFSM approach is that, although the CFSM at the top level has a uniform view of the FSM, at the lower implementation level, a hardware FSM looks significantly different than a software FSM. Hence the interface between the software part of and the hardware part of a single FSM and generally, the interface between a software FSM and the hardware components, for example, the hardware implemented in an FPGA, is relatively complicated. It would be desirable, if the interface between the hardware logic circuitry and the processor consume minimal resources and be designed to shield the hardware logic circuitry from the processor and vice versa.
Accordingly, there is a need for better techniques to design and implement an FSM using software executed on a processor and having accurate timing information.