Understanding and tuning memory system performance is of interest, for most programs, so as to achieve reasonable performance on current high performance systems. Traditionally, performance measurement and visualization tools have been control-centric, since they focus on the control structure of the programs (e.g., loops and functions). This is also where application programmers have typically concentrated when searching for performance bottlenecks. However, due to the advances in microprocessors and computer system design, there has been a shift in the performance characteristics of scientific programs from being computation bounded to being memory and/or data-access bounded.