Memory plays a substantial role in systems and has become almost ubiquitous in electronic devices we use today. As more of these devices, such as cell phones, PDAs, watches, and wristbands, incorporate memory into their systems, memory goes beyond the simple role of providing storage and takes into consideration additional factors such as latency, bandwidth, power consumption, weight, size, form factor, etc. Consequently, with these varying and often competing factors, a multitude of designs and methods with respect to memory are implemented to leverage certain benefits.
One of the methods of managing memory and overall system performance is through caching. A cache is a component in a system that handles data requests and stores data so that future requests can be served faster. A cache may be a standalone component or integrated into another component such as the CPU (Central Processing Unit) or GPU (Graphics Processing Unit). The size, level, hierarchy, design, location, and architecture of caches can vary significantly based on the desired objectives of the cache.
To achieve the desired performance and/or efficiency objectives, caches take into consideration many hardware and software related factors. With respect to hardware, factors such as system design, architecture, power, size, speed, and bandwidth are a few considerations. With respect to software, factors such as replacement algorithms, allocation policies, storage distribution, request prioritization, and spatial locality and temporal locality of data are some considerations. These factors are not comprehensive or strictly limited to hardware or software categorizations. Rather, they are illustrative of the broad range of considerations with respect to cache design and implementation.
There are various cache settings/policies that can be implemented to manage cache components and requests for data in a system. One aspect of a cache setting is the replacement algorithm which instructs how the cache component should manage the information it is storing and which items to discard when making room for new ones. Some example replacement algorithms include Least Recently Used (LRU), Most Recently Used (MRU), Random Replacement (RR), Pseudo-LRU (PLRU), Least Frequently Used (LFU), and Adaptive Replacement Cache (ARC). Each of these algorithms provides different methods for storing and handling data in the cache and has certain advantages and disadvantages in certain cases. This is only a brief list of examples in an active and expanding field.
Two key indicators of cache performance which a caching policy is generally trying to improve are “hit ratio” and “latency”. The general objective of any desired caching policy is to maximize the hit ratio and minimize latency. Some caching policies keep track of these indicators along with other information to improve their performance.
However, one of the limitations of caching and these low-level performance indicators is the limited scope of data requests. First is a limitation of scope in type of data requests handled by a specific cache component. That is, a Graphics Processing Unit (GPU) cache generally only handles graphics related data requests, and the CPU cache generally only handles CPU related data requests, thus lacking a broader view of incoming requests in the overall system. And second, primarily relying on a relatively small set of previously requested data to predict future requests for data. Since caches have limited storage capacity in order to be effective, they are generally only able to rely on a limited number of previous requests to make assumptions about future requests.
Another limitation is that a cache is generally a shared memory component. That is, multiple applications simultaneously use the same cache components within a system and compete for resources on that system. For example, a word processing application and a calculator application running on a system may both share the same cache components, such as the GPU cache and the CPU cache on a system. Thus, the cache components at a low-level may not be in the best position to determine the optimal memory sub-system settings for both of these applications.