1. Technical Field
Embodiments of the present invention relate to Direct Memory Access (DMA) system used for communications. More particularly, the present invention relates to a system for providing DMA communication for intellectual property (IP) cores of a programmable logic device (PLD) such as a Field Programmable Gate Array (FPGA).
2. Related Art
A common theme in FPGA-based architectural design is the interface between an embedded processor and one or more IP blocks. Communications across such interfaces typically include both data and control information. In particular, communications with IP cores generally involves movement of data and control tokens between hardware and software based Finite State Machine (FSM) elements. This communication may be achieved via typically three general approaches, (1) First In First Out (FIFO) streaming interface, (2) BUS transaction interface, and (3) DMA. Each case has advantages and disadvantages.
The first communication approach, FIFO, (First-In-First-Out), is conceptually simple. The FIFO depends on a simple streaming interface, with its associated bi-directional flow control, i.e., overflow/underflow. The FIFO is amenable to rate matching, and affords a simple hardware implementation model. This interface model is appropriate to broad dataflow processing classes of significant interest. One downside of FIFO is the parsing of control and data tokens. If simple flow control signals, (e.g., overflow, underflow), are not sufficient for the task at hand, control must be applied via a separate FIFO channel, with appropriate control/data synchronization. Further, FIFO does not permit random-access. Thus, whenever random-access is required, data must be buffered in some auxiliary RAM resource. In summary, FIFO-based streaming is most appropriate where simple serial data streaming is sufficient to the IP core processing model, and is accompanied by minimal-complexity FSM control.
FIG. 1 is a block diagram illustrating a standard bidirectional FIFO-based processor/IP core communication interface used in an FPGA. The processor shown is a Reduced Instruction Set Computer (RISC) 4, connecting to a single IP core 6. The FIFO buffers 121-3 and 131-3 provide a particularly simple symmetrical interface between the processor and IP core. A dual port Block RAM (BRAM) 2a of the FPGA forms the RISC processor 4 Instruction/Data memory resource. Data may then be propagated between the I/D BRAM 2a and the IP core 6 using the RISC processor 4. The control signals used between the RISC processor 4 and BRAM 2a include Chip Select (CS), and Write ENable (WEN), along with the ADdRess (ADR) and DATA transferred between the RISC 4 and BRAM 2a. The processor 4 and IP core 6 employ signals for management of the data interface, according to some streaming protocol typically implemented using the FSM 10 associated with the IP core 6. Instructions for data flow control between the RISC 4 and IP core can include: Data ReaDY (DRDY), OVerFlow (OVF), and UnDerFlow (UDF) that are transferred along with DATA information. The IP core 6 can include an FPGA BRAM memory 2b for auxiliary data or control signal storage. The FIFOs 121-3 and 131-3 provide a buffering function, affording some degree of asynchronous rate matching across the interface, depending upon FIFO depth, relative clock rates, and other factors. The FIFO communication technique has also been applied to streaming processor/co-processor communication models separate from an FPGA. When the processing model requires a more complex data organization, such as block-transfer, random access, or multiple buffer partitioning, using FIFOs is less efficient relative to a DMA or a BUS system.
A second communication method, using a BUS, represents an abstraction of the communications channel in form of a set of defined operations at the interface. Typical operations include READ/WRITE DATA (from/to a specified address), READ (IP Core/Channel) STATUS, WRITE (IP Core) CONTROL, READ INTERRUPT, and other operations. These operations are abstracted in the form of an Applications Programming Interface (API) that includes a set of function calls within a software programming environment. The API then implements IP Core/processor communications in form of a highly simplified procedural semantic. However, this convenience and flexibility comes at a cost. The BUS is by nature a shared resource. Thus, communications with multiple peripherals engenders arbitration, and is accompanied by a total bandwidth constraint. At high rates, arbitration typically engenders a significant overhead. To some extent generic master/slave DMA transaction and block-oriented (pipelined) data transfer may relieve bandwidth restrictions, but at a cost of significantly increased complexity and arbitration loss. Further, IP cores as BUS peripherals may require an internal rate matching buffer as means to structure data path-BUS communications. Thus, an essential doubling of required BUFFER/MEMORY resources may result, since data may be buffered on both sides of the BUS. This is in addition to hardware resources needed for BUS/BUS-Arbiter infrastructure. In sum, with multiple IP cores or peripherals, BUS transaction may engender high overhead in terms of hardware resources, control complexity, and aggregate bandwidth limitations.
FIG. 2 shows a block diagram illustrating communications between a processor 4 and IP core 6 using a standard BUS system 14. The RISC processor transfers data between the BRAM 2a and IP core 6, as in FIG. 1. Advantages of the BUS include simplification and unification of processor 4—IP core 6 communication in terms of a master/slave or peer-to-peer control model. To the extent bus arbitration overhead does not emerge as a limiting factor, another advantage the BUS offers is straightforward extensibility to multiple/diverse peripherals or IP cores 6. Disadvantages accrue primarily with regard to; (a) arbitration overhead, (b) hardware-level complexity, (c) and where burst-mode transactions are not supported, (i.e., no pipelining), there may exist significant transaction overhead. Further, data generated or consumed at the processor 4 or IP core 6 may still require rate-match buffering at the bus interface. Under such circumstances, data must again be stored in two separate locations.
A third type of communication, DMA provides high efficiency, speed, and flexibility in comparison to alternative approaches based upon FIFO streaming or BUS arbitration. The advantages of a DMA system can be extended to FPGA designs where the associated DMA controller and system components required do not significantly impact configuration resources of the FPGA. The DMA option hinges upon high-speed transfers between IP core data buffers and memory without processor intervention. Disadvantages accrue with regard to complexity, (typically a distinct control envelope for each IP core), and scalability, (too many DMA clients degrade overall memory performance). It is desirable to provide a DMA solution that addresses these disadvantages.