Computer systems often include one or more general purpose processors (e.g., central processing units (CPUs)) and one or more specialized data parallel compute nodes (e.g., graphics processing units (GPUs) or single instruction, multiple data (SIMD) execution units in CPUs). General purpose processors generally perform general purpose processing on computer systems, and data parallel compute nodes generally perform data parallel processing (e.g., graphics processing) on computer systems. General purpose processors often have the ability to implement data parallel algorithms but do so without the optimized hardware resources found in data parallel compute nodes. As a result, general purpose processors may be far less efficient in executing data parallel algorithms than data parallel compute nodes.
Data parallel compute nodes have traditionally played a supporting role to general purpose processors in executing programs on computer systems. As the role of hardware optimized for data parallel algorithms increases due to enhancements in data parallel compute node processing capabilities, it would be desirable to enhance the ability of programmers to program data parallel compute nodes and make the programming of data parallel compute nodes easier.
Data parallel algorithms often operate on large sets of data that may be distributed across multiple computing platforms. Large sets of data provide challenges in representing and tracking the data structures that describe the data as well as in moving the data across the multiple platforms. As a result, the process of managing large sets of data across multiple computing platforms is often complex and difficult to implement.