1. Field of Invention
The invention relates to a method and apparatus for generating a hardware stream processor design. In embodiments, the invention also includes a method for the optimization of hardware implementations for streaming data transfer between interconnected processes.
2. Background of Technology
In certain embodiments, the invention relates to computing using hardware processes communicating using unidirectional FIFO (first in, first out) data streams. Each hardware process has zero or more input/output ports that sink/source data. FIFO data streams connect between the input port of one process and the output port of another process (which could be the same process). Optionally, FIFO streams may also connect to I/O devices (input/output devices), for example a processor bus for interaction with software or a memory device.
Typically, hardware processes such as may be provided by a Field Programmable Gate Array (FPGA) run asynchronously and in parallel, reading data items from the process inputs and producing data at the process outputs. The FPGA might typically form part of an accelerator for use with a host computer, in which the FPGA is arranged to be configured by the customer or designer after manufacture, so as to perform its designated tasks and processes.
Similar networks of communicating processes, e.g. on an FPGA, are known in the literature as Kahn Process Networks (KPN). KPNs provide a distributed model of computation in which a group of deterministic sequential processes communicate through unbounded FIFO channels. A method and apparatus is required for implementing process networks in hardware such as FPGAs or other programmable logic devices, for high-performance computing.
It is known that FPGAs have limited resources, typically comprising Look Up Tables (LUTs) and FlipFlops, both of which are used for compute operations, and Block Random Access Memory (BRAM), used for buffering. FPGAs also provide reconfigurable interconnects that allow the limited to be connected together so as to provide, overall, a desired function or process on data passing through. By minimising hardware requirements or maximising efficiency of resource utilisation for the interconnect to implement FIFO streams between processes within a given process or network of processes, significant benefits can be achieved. The benefit of optimization is that more resources are available for compute, which translates to higher performance. FPGA configuration is generally specified using a hardware description language (HDL), and it is known that such devices can be used to implement any logical function that an ASIC could perform.
Typically, processes within an FPGA, are often pipelined hardware data-paths that compute a complex operation, for example, a multi-dimensional convolution. These processes are referred to herein as ‘kernels’. In other words, a kernel is a synchronous, pipelined data-path that produces/consumes data according to an application specific pattern at a particular clock rate. For example, a convolution kernel may run at 100 MHz and consume 2×32-bit input data points and produce 1×32-bit output data point every cycle (10 ns).
EIn addition to (or instead of) data-paths or computation, kernels also may perform elementary control of data flow. Two common examples are multiplex and de-multiplex kernels (Mux and Demux, respectively). A mux has multiple input ports and a single output port and connects a single run-time selectable input port to the output port. A demux has a single input port and multiple output ports and connects a single run-time selectable output port to the input port.