A last level cache (LLC) in a central processing unit (CPU) is generally sized to hold a few megabytes of data or instruction lines of recent memory accesses to lower the latency of requests from the CPU as compared to memory accesses of dynamic random access memory (DRAM). A cache that stores megabytes of data requires a large physical area and integrated circuit floor planning resources to provide maximum capacity at minimum response latency. In a multi-CPU and multi-LLC memory bank shared cache system, an interconnect network, sometimes referred to as a network-on-chip (NoC), is capable of high bandwidth between a master CPU and LLC memory banks but exponentially expands wire, power, timing, and area requirements for each instantiation of CPU and LLC memory bank.
An integrated circuit system design may prefer blocks such as CPUs or LLC memory banks to be as uniform and “tile-able” as possible, meaning by designing one module and repeating instantiations or tiling several identical copies adjacent to the original resulting in a desired configuration. For large LLC memory systems, a tile-able or modular LLC memory bank design may be incorporated in multiple product configurations with minimum re-design if the LLC's memory protocol is also tile-able and modular.
Common solutions may include full ring busses, meshes and direct end-to-end wires, where each solution balances system complexity, power consumption, die area, wire cost, scalability, and selective memory coherence requirements. While high-end server systems have high-complexity, high-power solutions (e.g., rings, meshes, and switches) and low-end designs have low-complexity, non-scalable solutions (e.g., end-to-end dedicated connections), there is a need for a solution for CPU systems for mobile devices (e.g., in a mid-range of 1 to 8 nodes (CPUs)) with low-power consumption, mid-range complexity, and mid-range bandwidth requirements while achieving tile-ability.