1. Technical Field
The present invention relates to a system and method for compiler optimization, and more particularly, to compiler optimizations for manycore processors.
2. Description of the Related Art
Manycore accelerators (e.g., manycore coprocessors) are being increasingly used for high performance computing. For example, 54 of the top 500 supercomputers are powered by manycore accelerators on the new list released in June 2013, which is a fourfold increase compared with two years ago. Since the massive parallel architectures of manycore accelerators can support running hundreds and thousands of threads in parallel, they can provide order of magnitude better performance and efficiency for parallel workloads as compared to multicore CPUs.
Although manycore accelerators have the ability to provide high performance, achieving performance on them remains a challenging issue. It usually requires very high expertise and effort from programmers to understand and make good use of the underneath architectures. For example, to develop high performance GPU applications, programmers need to be aware of the memory hierarchy and the warp-based thread organization, given their dominant impact on performance. Many static and runtime techniques have been developed to relieve the optimization burden from programmers for developing GPU applications. However, there is still a significant performance gap between compiler optimized code and highly tuned CUDA code.