Central processing units (CPUs) can be described as electronic circuitry that execute instructions of a computer program (e.g., executing the arithmetic, logic, control, and input/output (I/O) operations specified by the instructions). The CPU is separate from memory, but interacts with memory, often referred to as main memory, to retrieve and/or store data. The workload demand on CPUs has significantly increased over the years, pushing the limits of CPU ability to efficiently process data. Graphics processing units (GPUs) have been adopted for processing such intensive workloads. GPUs provide a higher computational throughput and are more energy efficient than traditional CPUs. In some architectures, both CPUs and GPUs are implemented together.
GPU programming, however, is complicated and error prone. For example, each GPU has its own memory. Consequently, explicit data transfer between main memory and GPU memory is needed. As another example, error detection and analysis are difficult, because different tools with different scopes are needed for CPUs and GPUs. In some systems, GPUs might not be available. Consequently, it can be necessary to develop a program twice, one for systems including a GPU, another for systems without. As still another example, parallelization concepts of GPUs and CPUs are different, and require different algorithm implementations. Further, highly parallel algorithms tend to be complex, and as such, synchronization between threads is needed. This can impede scalability or even lead to deadlocks.