Programmable logic devices (“PLDs”) are a well-known type of integrated circuit that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (“FPGA”), typically includes an array of programmable tiles. These programmable tiles can include, for example, input/output blocks (“IOBs”), configurable logic blocks (“CLBs”), dedicated random access memory blocks (“BRAMs”), multipliers, digital signal processing blocks (“DSPs”), processors, clock managers, delay lock loops (“DLLs”), and so forth. Notably, as used herein, “include” and “including” mean including without limitation.
One such FPGA is the Xilinx Virtex® FPGA available from Xilinx, Inc., 2100 Logic Drive, San Jose, Calif. 95124. Another type of PLD is the Complex Programmable Logic Device (“CPLD”). A CPLD includes two or more “function blocks” connected together and to input/output (“I/O”) resources by an interconnect switch matrix. Each function block of the CPLD includes a two-level AND/OR structure similar to those used in Programmable Logic Arrays (“PLAs”) and Programmable Array Logic (“PAL”) devices. Other PLDs are programmed by applying a processing layer, such as a metal layer, that programmably interconnects the various elements on the device. These PLDs are known as mask programmable devices. PLDs can also be implemented in other ways, for example, using fuse or antifuse technology. The terms “PLD” and “programmable logic device” include but are not limited to these exemplary devices, as well as encompassing devices that are only partially programmable.
For purposes of clarity, FPGAs are described below though other types of PLDs may be used. FPGAs may include one or more embedded microprocessors. For example, a microprocessor may be located in an area reserved for it, generally referred to as a “processor block.” Additionally or alternatively, microprocessors may be implemented in programmable logic of an FPGA (“FPGA fabric”). These microprocessors are generally referred to as “soft processors” in contrast to embedded microprocessors which are generally referred to as “hard processors.” Whether hard or soft, microprocessors may be any of a variety of known architectures, including a reduced instruction set computer (“RISC”), a complex instruction set computer (“CISC”), or a Zero Instruction Set Computer (“ZISC”) form.
Architectures associated with multi-processor array (“MPA”) configurations have been based on a variety of infrastructures for communication between microprocessors. For example, in some architectures, a networking model is used for a Network-on-Chip (“NoC”). In other architectures, a hierarchical bus model is used for a System-on-Chip (“SoC”). Furthermore, in other architectures, a data streaming via buffering model with first-in-first-out buffers (“FIFOs”) is used for an MPA implemented in an FPGA.
In a bus-based SoC implementation, the convenience of abstracting away notions of clock signals, hardware-level concurrency, and pipelining, among other circuit-based factors, with the transaction-level abstraction provided via an Application Programming Interface (“API”) comes at a price. With bus-based communication, well-known performance degradation is exhibited as the number of clients is increased. This degradation in performance is rooted in a combination of bandwidth sharing and arbitration-induced losses.
With respect to a streaming data approach, microprocessors implemented in an FPGA communicate using directly connected FIFOs. Access to information in such FIFOs is not random, and this may severely constrain performance in applications involving random access to such information. Moreover, an intimate working knowledge of circuitry issues, such as timing constraints, pipelining, and number of clock cycles, among other known circuitry issues, may be involved in parallel programming for such MPAs.
FPGA-based NoC architectures have been proposed to overcome degradation due to an increase in the number of clients owing to the relative ease with which multiple concurrent connections are supported in such NoC architecture. However, due to the distributed nature of NoC arbitration and routing mechanisms, NoC remains significantly more complex. Furthermore, in its serial form, an NoC may have bandwidth limitations due to serial-to-parallel datapath conversion overhead. This additional overhead also increases complexity. Moreover, a parallel form of NoC suffers from scalability limitations as network infrastructure is by nature highly consumptive of routing resources.
Additionally, for FPGA-based bus or network infrastructures, there exist additional problems of synthesizing and mapping a hardware description language (“HDL”) representation to structures that are resource-efficient. Moreover, this inefficiency, in addition to being based upon resource utilization, may further be based upon inefficiencies associated with pipeline latency and overall speed.
Accordingly, it would be both desirable and useful to provide an MPA architecture that is at least less susceptible to one or more of the above-identified limitations.