The parallelization of tasks is used to increase the throughput of computer systems. To this end, compilers may extract parallelized tasks from program code to execute in parallel on the system hardware. To increase parallel execution on the hardware, a multi-core architecture may include multiple processor cores, e.g., a CPU, a GPU, a FPGA, etc. When an architecture includes multiple cores of different types it may be referred to as a heterogeneous multi-core architecture.
CUDA (Compute Unified Device Architecture) is a parallel computing platform and heterogeneous programming environment that allows the user to take advantage of the multi-core architecture which may include a CPU and a GPU, for example. Using CUDA, GPUs can be used for general purpose processing, and not exclusively for graphics processing. Thus, using CUDA, developers can develop code that can partially execute on a CPU and partially on a GPU. In other words, some code may be assigned to the CPU while other code may be assigned to the GPU, etc. The CUDA platform is accessible to software developers through, for example, extensions to industry-standard programming languages including C++. CUDA C++, therefore, extends the standard C++ language to target heterogeneous programming.
Similar to CUDA C++, C++ Accelerated Massive Parallelism (C++ AMP) extends the standard C++ language by taking advantage of data parallel hardware such as graphics processing unit (GPU) on a discrete graphics card. By using C++AMP, the programmer can code multi-dimensional data algorithms so that execution can be accelerated by using parallelism on heterogeneous hardware.
In both CUDA C++ and C++ AMP, functions are associated with one or more execution spaces that denote the underlying computing substrate on which the function may be executed. For example, a function could be associated with two different execution spaces, wherein one execution space denotes the CPU computing substrate and the other execution space denotes the GPU computing substrate. The problem with conventional compilers that compile code for heterogeneous programming environments is that their method of processing functions marked with multiple execution spaces is complex and computationally inefficient. For example, conventional compilers will typically need to generate and represent multiple parse trees for each execution space invoked by a function. This is not only complex but also inefficient.