As known in the prior art, digital logic devices like Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs) are becoming more powerful and are becoming more and more capable of generating and processing large amounts of data. One of the challenges associated with this increased capability is the ability to transfer that large amount of data on and off the logical device.
For the purpose of optimizing streaming transfers, streaming transfers can be categorized according to transfer length two classes. These classes can be fixed length and variable length transfers. Fixed length transfers are ones where the size of the payload is fixed. Variable length transfers are ones where the payload length is not deterministic.
Processors cannot typically be interrupted to immediately receive data from an FPGA. Instead, data is placed into an easily accessed memory buffer and the processor is notified to pick up the data when available. If using a fixed length payload and both the sending FPGA and the receiving device processor know the size, transfers become simplified because the processor knows how much data it should be retrieving from the buffer. Additionally, if the rate the data is being transferred is known, and does not change, the processor knows how many transfers need to take place before the buffer overflows. These both represent problems that are not easily overcome for variable length payloads.
There are not any circuits in existence for generic payload transfers of both variable and fixed length data transfers; however, there are a couple domain specific solutions. One such solution is Direct Memory Access (DMA). DMA provides a convenient way to transfer fixed length payload data from one memory to another and in some cases will interface to a streaming interface. The main downfall of DMA engines are that they are very difficult to use for variable length transfers. The trade space for variable payload transmissions is cumbersome and produces high processor utilization, increased latency, and does not provide a method for knowing when the receive memory buffer overflows.
Another domain specific circuit is the transmission control protocol (TCP) offload engine (TOE). The TOE can provide a transfer circuit specifically for receiving variable length TCP packets. However, the TOE requires data to be formatted as a TCP packet. This formatting can add overhead to the transfer process, which is not necessary for the point-to-point or multicast network topologies. This can negatively affect the throughput and can require the logic device to have a TCP Stack, which can require additional internet protocol (IP), use up logic resources, and add latency.
In view of the above, it can be an object of the present invention to provide a format agnostic data transfer circuit that can be adapted to efficiently transfer both fixed and variable data sequences. Another object of the present invention can be to provide a format agnostic data transfer circuit that can be configured to transfer data between different types of logic devices. Still another object of the present invention can be to provide a format agnostic data transfer circuit that can minimize processor overhead and latency. Yet another object of the present invention can be to provide a format agnostic data transfer circuit, which can also increase software security and data integrity by implementation of hardware buffer tracking mechanisms. Yet another object of the present invention to provide a format-agnostic data transfer circuit that can be easy to implement in a cost-effective manner.