General Purpose Graphics Processing Unit (GPGPU) computing involves a performance of rapid mathematical calculations for data parallel applications such as image rendering or matrix multiplication. Such data parallel workloads may be performed at a graphics processing unit (GPU), which is a specialized electronic circuit, primarily indented to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display, yet suitable for accelerating general purpose computing. A GPU may often be implemented in a heterogeneous system in which the GPU shares memory with a central processing unit (CPU).
In a heterogeneous system it is beneficial to physically share statically allocated data declared as being mapped to the GM (or target). This data includes global variables, file-scope static variables, or routine static variables. However, such data sharing is not currently efficient. For instance, existing heterogeneous applications copy static data before/after offload region execution to/from the target device, which incurs significant overhead. The overhead is attributed to static data being part of an executable image, and references from code to the data having to be relocated by a linker relative to the image according to an order of linked modules, sections within modules, etc. In the case of heterogeneous applications, there are two or more different executable images (e.g., CPU (or host) and target(s)); thus references from the target code must go to the host image for the data to be physically shared.