A new generation of massively parallel processors, including graphics processing units (GPUs), the IBM Cell BE processor, and other multi-core or vector processors, can offer faster computation than traditional processors by an order of magnitude. Achieving the potential performance of these processors typically requires a detailed understanding of processor hardware and memory architectures as well as of sophisticated parallel programming techniques. For example, typically programming applications for graphical processing units may require programmers to learn a large number of graphics concepts as well as understanding different optimizations required to achieve optimum performance in an environment in which the cache, memory and execution architectures may differ significantly from that of traditional processing units. Additionally, parallel programming itself is not intuitive for many programmers, as it requires techniques and algorithms that are not required for traditional serial programming and introduces numerous new development and debugging challenges.