Computer systems often include one or more general purpose processors (e.g., central processing units (CPUs)) and one or more specialized data parallel compute nodes (e.g., graphics processing units (GPUs) or single instruction, multiple data (SIMD) execution units in CPUs). General purpose processors generally perform general purpose processing on computer systems, and data parallel compute nodes generally perform data parallel processing (e.g., graphics processing) on computer systems. General purpose processors often have the ability to implement data parallel algorithms but do so without the optimized hardware resources found in data parallel compute nodes. As a result, general purpose processors may be far less efficient in executing data parallel algorithms than data parallel compute nodes.
Data parallel compute nodes have traditionally played a supporting role to general purpose processors in executing programs on computer systems. As the role of hardware optimized for data parallel algorithms increases due to enhancements in data parallel compute node processing capabilities, it would be desirable to enhance the ability of programmers to program data parallel compute nodes and make the programming of data parallel compute nodes easier.
Data parallel algorithms often operate on large computational spaces. The computational spaces usually include some form of indexing to allow individual data elements to be accessed and operated on. At times, however, data parallel algorithms may generate a desired result by operating on the indexing structures of the computational spaces rather than the data elements.