The present disclosure relates to the operation of a memory nest, and more specifically, to methods, systems and computer program products for optimizing the performance of a memory nest for a workload.
In general, computers, such as servers, are configured to execute data intensive workloads efficiently using available hardware resources, e.g., one or more processors and a memory nest. As used herein the term memory nest refers to the various types of storage that can be used by a processor to store data. In general, the memory nest includes a hierarchy of caches and physical memory. In general, as the level of the memory nest increases, the distance from the processor to the data increases and access latency for the processor to retrieve the data also increases.
When an instruction executing on a processor requires data and the data exists in the cache of the processor, a cache hit occurs and the processor executes the instruction. However, when the data does not exist in the cache of the processor, a cache miss occurs and is resolved using the memory nest by placing the data in the cache of the processor to execute the instruction. When a cache miss occurs, the time delay associated with retrieving the data needed increases as the level of the location of the data in the memory nest increases.
As used herein, the term workload refers to a group of work units an operating system is executing, or waiting to execute on a processor. Each work unit of the workload has an associated working set of data, which is data that is accessed by the processor during the execution of the work unit. As the processor executes a work unit, the data in the working set is brought into the processor cache from higher levels of the memory nest. The working set data settles in the memory nest such that frequent data accesses tend to be stored in lower level caches that are on or close to processor, and infrequent data accesses tend to be cached in higher level caches that are further from the processor. As a work unit executes on the processor, new data accesses push the working set of all other work units of the workload into higher levels of the memory nest. Accordingly, the longer a work unit consecutively executes on a processor, the more efficient the memory nest use becomes for the executing work unit. During execution, when the working set of the work unit changes, the process repeats such that the work unit's new working set is brought into the processor cache from the memory nest, the data access settle in the memory nest such that frequent data accesses tend to be stored in the lower level caches, the infrequent data accesses tend to be cached in higher levels of cache further from the processor, the work unit's previous working set and other work units' working set are pushed to even higher levels of the memory next. While the memory nest remains efficient for executing the current work unit, as the current work unit's working set changes, the memory nest efficiency for other work units in the workload as a whole is being further degraded because when the other work units in the workload start executing again, their working set data is at a higher level in the memory nest, so those work units experience high latency to bring the working set data into the processor cache.