The term general-purpose computing on graphics processing units, GPGPU, is used for the use of specialized, highly parallel hardware, to do computationally demanding tasks that would normally be done on a normal processor. The hardware can be a video card or some other computing device. In most GPGPU programming environments, the main program, which can be run on a central processing unit, CPU, and the kernels running on the device for the computationally demanding tasks are parsed by separate compilers. The main program is parsed by an ordinary compiler and is written in an ordinary programming language, while the kernels are parsed by a dedicated compiler and are written in a specialized programming language.
A technique related to language embedded programming has first been described by Thomas C. Jansen in his doctoral thesis “GPU++—An Embedded GPU Development System for General-Purpose Computations”, Technical University Munich, 2007. Therein methods of flow control, such as loops or if-clauses are not included. Therefore the disclosure is limited to a very small set of programs and does not enable general-purpose programming.
WO2012/097316 describes techniques for extending the architecture of a general-purpose graphics processing unit with parallel processing units to allow efficient processing of pipeline-based applications. The techniques include configuring local memory buffers connected to parallel processing units operating as stages of a processing pipeline to hold data for transfer between the parallel processing units.
Object-oriented programming languages allow the definition of new data types, along with corresponding operators. In language-embedded programming, special data types are defined in such a way that instead of doing the actual computation, the steps of computation are recorded and used to generate the machine code for the device. In this way, the kernels are fully integrated into the main program and don't have to be parsed by a special compiler.
These special data types are used to represent values that reside on the device. These values will typically be stored in registers. In one example, the type names for the device values are the intrinsic type names prefixed by the expression “gpu_”, i.e., int becomes gpu_int, float becomes gpu_float, etc. Other naming conventions are possible as well. The kernels can be accessed as functions that use these special data types. When such a kernel function is executed on the CPU, the use of the device data types will create an expression graph, in which the steps of computation are represented. Each device variable holds a pointer to a node in the expression graph that determines how its value is computed. From this expression graph the kernel code is generated.
With the teachings of the prior art, the kernel cannot be integrated into the main program, unless two separate compilers are used.
In addition, other objects, desirable features and characteristics will become apparent from the subsequent summary and detailed description, and the appended claims, taken in conjunction with the accompanying drawings and this background.