Caches are used extensively in modern integrated circuits (ICs) including System-on-Chips (SoCs) to improve the performance of agents such as a processor, a graphics processing unit (GPU), a video decoder, a video encoder, an imaging processor, a digital signal processor (DSP), etc.
A cache allows some data to reside closer to such an agent, leading to lower latency, higher bandwidth, or both, when cache data is accessed. These advantages may be particularly critical to SoC performance because in many cases main memory, e.g. dynamic random access memory (DRAM), does not provide sufficient latency or bandwidth as required by many agents.
Modern SoCs use a hierarchy of caches, typically comprising three levels, where increasing cache levels have increasing capacities but at the expense of performance, such as for example increasing latency and decreasing bandwidth. A first level (L1) cache tends to be relatively small and closely integrated with an agent, whereas a last level cache, for example a level 3 (L3) cache, is relatively large and shared by many or all agents in a SoC.
Many processor architectures, including the ARM® architecture (ARM is a registered trademark of ARM Ltd), define architecturally visible caches, whose behavior is controlled by elements of the architecture. Such controls may relate to whether some data can be cached or not, and can be shared or not. Caches enabled to contain shared data may support a hardware mechanism for cache coherency, so that the most up-to-date version of a piece of data can be used by any agent, regardless of which caches currently contain the data.
Because these architecturally visible caches can be bypassed on purpose, e.g., a request is tagged (marked) as non-cacheable, or by necessity (for agents that do not have access to the cache coherency hardware mechanism), the architecture supports ways to make sure data is flushed from the architecturally visible caches. This is usually done through cache maintenance operations.
Architecturally visible caches enabled to contain shared data may be referred to as coherent caches as they support hardware means to share the data. One such example is a third level cache (L3), which in many systems is the largest and last level of cache.
Another type of cache is system cache, or what is sometimes referred to as memory cache or target-side cache. A system cache is not architecturally visible and requires no direct control from agents in the system, such as cache maintenance operations. Instead, a system cache is enabled to see all traffic going to a particular destination (e.g., main memory), so shared data can be cached and looked up without special maintenance requirements from an agent. Because a system cache caches a particular destination, it is architecturally transparent. Agents may give hints to a system cache regarding the desire to allocate, not to allocate or to de-allocate particular data, but such hints are merely performance hints and are not necessary for proper operation of the system cache.
A coherent architecturally visible cache has an advantage in that it may be finely controlled by architecturally defined operations, and a coherent architecturally visible cache may finely interact with a hardware cache coherency mechanism to provide better effective performance or capacity. For example, a coherent architecturally visible cache may be exclusive of other caches, e.g., data may not be both in the coherent architecturally visible cache and in other lower level caches.
However, agents not participating in a hardware cache coherency mechanism may not be able to use a coherent architecturally visible cache, and the behavior of a coherent architecturally visible cache is in large part dictated by the processor architecture, with less flexibility to improve performance. On the other hand, a system cache may provide caching service to all agents and is very flexible in its handling of data.
In a conventional SoC, the highest level (e.g., L3) of cache is either a coherent architecturally visible cache or a system cache. If the highest level cache is a system cache, the SoC may also have a large coherent architecturally visible cache. The highest level cache is costly in terms of silicon area, so the choice of spending last level cache area on a coherent cache vs. a system cache should be carefully considered as the resulting system behavior may differ greatly, and there may not be the option to achieve the benefits of both types of cache at no or low additional area cost.