1. Field of the Invention
Embodiments of the invention relate generally to a method and apparatus for designing and generating a stream processor such as a hardware accelerator.
2. Background of Technology
Typically, a stream processor such as a hardware accelerator such as might be provided by the assignee, Maxeler Technologies Ltd., consists of a Field Programmable Gate Array (FPGA), connected to multiple memories or other external data sources/sinks, as shown in FIG. 1. On the FPGA, the circuit is made up of a manager containing one or more blocks including kernels.
Kernels are hardware data-paths implementing the arithmetic and logical computations needed within an algorithm. A “manager” is the collective term for the FPGA logic which orchestrates or controls data flow between Kernels and off-chip input/output (I/O) in the form of streams. By using a streaming model for off-chip I/O to the associated external components, e.g. PCI Express bus, MaxRing and DRAM memory, managers are able to achieve high utilization of available bandwidth in off-chip communication channels. A user, when designing or configuring an FPGA, controls the designs of the kernels and the configuration of the manager so as to ensure that the FPGA performs the desired processing steps on data passing through it.
Typically dataflow hardware accelerators implement a streaming model of computation in which computations are described structurally (computing in space) rather than specifying a sequence of processor instructions (computing in time). In this model of computation, a high-level language is used to generate a graph of operations. Each node in the graph executes a specific function on incoming data and outputs the result, which becomes the input to another node in the graph. The data being processed “flows” through the graph from one node to the next, without requiring writing back to memory. This graph may then be implemented as an application-specific circuit within an FPGA accelerator.
Kernels are typically statically scheduled by a compiler at build time. This means that the dataflow through a kernel is precisely orchestrated on each clock cycle. In contrast, managers are dynamically scheduled, meaning that events occur in an irregular manner and data is passed to and from the statically scheduled units. The combined system as typically provided on an accelerator chip, e.g. an FPGA, together with the driving software, is therefore a dynamic system.
Streaming accelerators implemented using FPGAs or other similar processing technology, can offer increased performance on many useful applications compared to conventional microprocessors. See for example our co-pending applications, U.S. application Ser. Nos. 12/636,906, 12/792,197, 12/823,432, 13/023,275 and 13/029,696, the entire contents of all of which are hereby incorporated by reference for all purposes.
The implementation of a streaming processor can be understood as being made up of a data path, representing the computation performed on data as it flows through the streaming circuit and a control path representing decision making within the circuit, for example over what data should be output.
The correctness of the control flow of a streaming processor may depend entirely on the correctness of the control path. Control path errors could lead to a streaming accelerator producing an incorrect amount of output, or deadlock occurring, regardless of whether the data values are correct (which is dependent on the data path).
In compiling a chip design, it is possible for a variety of flow control errors to occur. For example, any or all of the following errors can typically occur:                1. The chip may fail to consume the required amount of input data. This could cause the external data source to fail to send the amount of data expected into the system and stall as a result.        2. The chip may try to consume too much input data. Since data is not available, the chip will starve and stop running        3. The chip may not produce enough output data, in which case a read-from-chip operation will starve.        4. On the chip, the scheduling of data inputs and outputs may be such that the internal buffering in the manager is not sufficient to “smooth out” the transfers and the chip will deadlock.        
Flow control errors can therefore be particularly serious as they can cause a chip to crash or deadlock. These are standard issues in parallel programming, whether in hardware or software. During the process of generating an FPGA design these problems are sorted out by “debugging” so as to ensure that the hardware design finally arrived at, does not suffer from these flow control problems.