In many current applications data-sets of significant size are generated and need to be processed. For example, in oil and gas exploration where mapping of a surface such as an ocean floor is required, typically a grid of sensors, distributed over a very large area, might be used to collect data to help in the search for hydrocarbon reserves. A sonic impulse is provided and each of the sensors serves to measure reflection from the surface or sub-surface being mapped. This might include measuring, at each sensor, the frequency, amplitude and delay of the received signals. Ocean based surveys typically use 30,000 sensors to record data over a 120 dB range each being sampled at more than 2 kbps, with new sonic impulses every 10 seconds. The volume of data generated with such a system is significant, typically of the order of terabytes of data each day. Similar sized (or larger) data-sets might be generated in other applications too where a physical process is modelled. For example in areas such as fluid flow modelling or financial data modelling the size of the data-set can be very large indeed.
Accelerators are known for use with computers to enable them to process such data-sets. An example of a known accelerator is the Maxeler MAX2 PCI Express Card. Typically an accelerator includes a field-programmable gate array (FPGA), designed to be configured by the customer or designer after manufacture. The FPGA configuration is generally specified using a hardware description language (HDL), and it is known that such devices can be used to implement any logical function that an ASIC could perform.
Conventionally, in use, whilst processing such a data-set, a substantial portion of the data-set is transferred to and from the accelerator on each iteration. Numerical solutions typically involve many such iterations. A typical data-set size approaches the maximum memory capacity of current state-of-the-art computers.
The accelerator computes an iteration by streaming data from the data-set through a compute pipeline. In some cases it might be that part of the compute operation is performed by the processor of the computer and not by the accelerator. In the example of a 3D convolution operation, the boundary conditions and/or special encoding or decoding of the data-set may be performed by the computer's own processor whilst the majority of the computation required to complete the convolution calculation is performed on the accelerator.