Cache architectures have a long history in the design of computer systems. In many cases, they help to increase an access speed—or in other words, to decrease the access time—between a CPU (central processing unit) and a main memory. This helps to reduce the so-called Von-Neumann bottleneck and may increase the speed of processing dramatically.
Different cache architectures have been introduced over time, including L1, L2 and L3 caches, e.g., as inclusive or exclusive caches. Today, load and store instructions to and from the main memory are performed with a fixed amount of payload, e.g., 64 bytes or 128 bytes. This may be sub-optimal during various phases of workload execution. E.g., during workload phases with a lot of scattered data with sizes of only a few bites to be loaded or stored, a large payload size (large cache line size) may result in loading a lot of data that are not used by the workload, or storing data that have not been updated. This results in a waste of precious memory bandwidth and potentially increased latencies and may furthermore result in conflicts for unused data.
On the other side, during workload phases with accesses to a large amount of continuous data (e.g., simple one-dimensional arrays) to be loaded or stored, a small payload size may result in many concurrent load or store instructions in flight or even in serialization of request execution. Both of these scenarios will result in a decrease of throughput and add to an unnecessary consumption of computing resources, like number of gates and required power.