Multi-core platforms may pose challenges to achieving good scaling while running multiple copies of memory access intensive applications. Memory access intensive applications may lead to a memory bandwidth contention and poor scaling, as the memory bandwidth is saturated resulting in lower throughput performance, especially for processors with hyperthreading enabled. These challenges have been tackled with a set of compiler optimizations configured to improve data locality and to reduce memory access traffic. The set of compiler optimizations includes loop interchange, loop fusion and loop blocking, and are configured to enhance performance by improving data-locality and reducing memory access traffic.
A conventional loop optimization technique may be applied to a single loop nest in order to improve memory locality and thus increase the throughput performance. A chain of loop nests where the loops are separated by the control codes may inhibit application of optimizing techniques because of the control code.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art.