A parallel processing system includes a plurality of processing elements that evaluate a kernel over an iteration space. In one application, for instance, each processing element accepts an input data item in the iteration space, evaluates the input data item using the kernel, and stores a resultant output data item to a data structure, which may also be indexed by the iteration space.
In one approach, a user can produce a kernel by writing it in an appropriate source language. The user may then statically compile the entire source code file and load the resultant compiled source code on the parallel processing system. In another approach, the parallel processing system may perform runtime compilation on an intermediate language version of the kernel. These techniques provide good performance; however, for reasons sets forth herein, these techniques may not be entirely satisfactory for all applications of parallel processing systems.