Computer architects need a deep understanding of clients' workload in order to design and tune architecture. Unfortunately, many important clients will not share their software to computer architects due to the proprietary or confidential nature of their software. Examples include weapons simulation software, trading algorithm software, and software that handles trade secrets, or clients' sensitive date. This problem of proprietary workload is widespread in the industry. One practice in dealing with the proprietary workload is for computer architects to substitute the workload using non-proprietary versions (e.g., open source) or for clients to provide hints of the algorithms used in their workloads (e.g., a highly-connected large graph algorithm), and it is up to computer architects to manually reconstruct software that fit that description. The process usually requires huge time investments and deep subject expertise. A promising alternative is to clone the workload, a process of extracting a statistical summary of the behavior of the client's workload through profiling, followed by generating a synthetic workload (a clone) that produces the same statistical behavior. Workload cloning can be automated, relieving the highly manual efforts needed otherwise.
One particular aspect of workload cloning that has not provided a satisfactory solution thus far is cloning of memory access behavior. Most prior cloning solutions have focused on the instruction level parallelism behavior. However, it is desired to provide good cloning techniques in the memory access behavior space. For one reason, the cache hierarchy is becoming larger and more complex. Understanding how workloads perform in such a large and complex subsystem is crucial. In another reason, the growth in the number of cores puts a tremendous pressure on the memory hierarchy, shifting the bottleneck of performance from cores to the memory hierarchy. In another reason, due to cores sharing some parts of the memory hierarchy, proprietary workloads cannot be evaluated in isolation, but in combination with other co-running workloads.
For at least the aforementioned reasons, there is a need for improved techniques for generating clones that better model workloads.