Programmable logic devices (PLDs) exist as a well-known type of integrated circuit (IC) that may be programmed by a user to perform specified logic functions. There are different types of programmable logic devices, such as programmable logic arrays (PLAs) and complex programmable logic devices (CPLDs). One type of programmable logic device, known as a field programmable gate array (FPGA), is very popular because of a superior combination of capacity, flexibility, time-to-market, and cost.
An FPGA typically includes an array of configurable logic blocks (CLBs), programmable input/output blocks (IOBs), and like type programmable elements. The CLBs and IOBs are interconnected by a programmable interconnect structure. An FPGA may also include various dedicated logic circuits, such as memories, digital clock managers (DCMs), and input/output (I/O) transceivers. Notably, an FPGA may include one or more embedded processors. The programmable logic of an FPGA (e.g., CLBs, IOBs, and interconnect structure) is typically programmed by loading a stream of configuration data (known as a bitstream) into internal configuration memory cells. The bitstream is typically stored in an external nonvolatile memory, such as an erasable programmable read only memory (EPROM). The states of the configuration memory cells define how the CLBs, IOBs, interconnect structure, and other programmable logic are configured.
When implementing systems in programmable logic fabrics, there can be two competing goals: maximizing data throughput and minimizing resource cost. Typically, in a dataflow system, this tradeoff appears in the implementation of communication buffers between dataflow actors. Generally, using dedicated communication queues allows for higher throughput, while using a shared memory is often architecturally cheaper in terms of resource cost. The tradeoff is particularly evident when transmitting large data objects from one actor to another, such as network packets in a router, frame data in a video decompression system, or the like. Currently, the allocation of data between higher throughput and architecturally cheaper storage is performed on a communication buffer specific basis (e.g., one particular buffer is implemented using queue storage, whereas another buffer is implemented using shared memory). Such an allocation does not account for the particular data application; i.e., the manner in which the data passing through the communication buffers is accessed by the actors.
Accordingly, there exists a need in the art for a method and apparatus for implementing a dataflow circuit model using application-specific memory implementations.