High-performance computing (HPC) systems often include a general purpose processor executing program code in combination with a specialized co-processor or hardware accelerator performing some function(s) on behalf of the general purpose processor. The HPC system may realize improved performance as a result of the specialized co-processor or hardware accelerator performing the function(s) instead of the general purpose processor executing code to perform the function(s).
The specialized co-processor and hardware accelerator are referred to generically herein as function accelerators. Depending on application requirements, the function accelerator may take the form of a graphics processing unit, a floating point unit, a single instruction multiple data (SIMD) vector unit, a function implemented as a digital circuit (without software) on an ASIC or in programmable logic such a field programmable gate array (FPGA).
In some development environments both the functions to be performed by the general purpose processor and the function(s) to be performed by the function accelerator may be specified in a high-level language (HLL) such as Fortran, C, C++, or JAVA®, for example. The high-level program is partitioned into parts to be implemented as software for the general purpose processor and parts to be implemented on the function accelerator. The parts to be implemented as software for the general purpose processor are compiled using a compiler suitable for the language and the target general purpose processor. A compiler that targets a co-processor may be used for a co-processor implementation, while a more specialized tool suite may be used to generate a hardware accelerator that performs the desired function(s). U.S. Pat. No. 7,315,991, entitled “Compiling HLL into Massively Pipelined Systems,” by Bennett, which is herein incorporated by reference in its entirety, describes one approach for generating a hardware accelerator from an HLL program.
In HPC applications, data in a shared memory space is processed by both the software executing on the general purpose processor and by the function accelerator. The software executing on the general purpose processor may depend on data from the function accelerator, and the function accelerator may depend on data processed by the software. Due to latency in the transferring of data processing delays may occur and reduce system throughput.
The present invention may address one or more of the above issues.