For performance and cost (e.g., energy) efficiency, some high-performance computing algorithms and applications execute on single chips of a computer or a computing system with hundreds or thousands of processing cores. Such algorithms and applications include, for example, climate modeling, fluid physics simulations, and heat transfer simulations. One characteristic of these applications is that they operate on large amounts of data. For example, some heat transfer simulations compute the temperature of atoms in a three-dimensional space every time cycle. Due to the large amounts of data these algorithms read and write, the algorithms may not make full use of the computing elements because they may be constrained by main memory bandwidth.