Applications, such as high-performance computing, using a vast amount of data, have been used in high-speed and large-scale scientific computing fields, such as finite element method, electromagnetic field analysis, and fluid analysis. Higher-speed operation may be achieved by implementing the applications, which use array data, in the form of a hardware structure, for example, an accelerator including a field-programmable gate array (FPGA) or a graphics processing unit (GPU). A general-purpose computing on graphics processing units (GPGPU) has been recently used as an even higher-speed accelerator.
Accelerators using dedicated hardware, such as FPGA and/or GPU (GPGPU) draws attention and is being used because a large amount of increase in the throughput of an individual central processing unit (CPU) is difficult. A typical accelerator reads data from and write data to a large-capacity memory or storage via a data bus. Hardware constraints make it difficult to set the data transfer band (transfer rate) of the data bus to be wider than that of the CPU. On the other hand, an arithmetic circuit in the accelerator largely outperforms the CPU in throughput. To maximize the throughput of the accelerator, data to be used in the arithmetic circuit in the accelerator is to be supplied to the accelerator via the data bus at an appropriately desired timing.
Although the accelerator including a circuit, such as FPGA, greatly outperforms the CPU in throughput, the data transfer characteristics of the data bus place a limit on overall performance. Available as a technique to improve the throughput of the accelerator using the FPGA is a pipeline operation. The pipeline operation is a circuit configuration method that increases operation parallelism. In the pipeline operation, a process is divided into multiple stages of arithmetic circuits (also collectively referred to as an arithmetic circuit) such that each output becomes an input of a next stage, and the stages of the arithmetic circuit are concurrently operated.
A variety of techniques have been disclosed as information processing techniques to achieve even higher-speed operation through the pipeline operation.
The techniques described above are disclosed in Japanese Laid-open Patent Publication Nos. 11-053189 and 05-158686.
As described above, the accelerator using FPGA performs the pipeline operation. In applications where many types of data sets are handled, and the data set dynamically used varies during the pipeline operation, there is a possibility that performance drops.
The accelerator using the FPGA offers a higher throughput, but the number of pipeline stages tends to increase because of a narrow data transfer band. If the data set in use dynamically varies in the accelerator (information processing apparatus) having a large number of stages, the accelerator may request new input data from a host, or the pipeline operation is reset. Throughput is thus degraded.
It is thus contemplated that all possible data sets are transmitted from the host to the accelerator. In this case, however, data that is of no use is also transmitted to the accelerator. This may cause a lower data rate, leading to throughput degradation in the pipeline operation.
It is also contemplated that an execution path is segmented such that no branch occurs in the execution path of the pipeline operation. In this case, as well, intermediate data at a segmentation point is connected to an output and an input of the accelerator. This may also lead to throughput degradation in the pipeline operation.