Field of the Invention
This invention relates to computing systems, and more particularly, to efficient cache data access in a large row-based memory of a computing system.
Description of the Relevant Art
As both semiconductor manufacturing processes advance and on-die geometric dimensions reduce, semiconductor chips provide more functionality and performance. However, design issues still arise with modern techniques in processing and integrated circuit design that may limit potential benefits. One issue is that interconnect delays continue to increase per unit length in successive generations of two-dimensional planar layout chips. Also, high electrical impedance between individual chips increases latency. In addition, signals that traverse off-chip to another die may significantly increase power consumption for these signals (e.g., by 10 to 100 times) due to the increased parasitic capacitance on these longer signal routes.
Another design issue is that most software applications that access a lot of data are typically memory bound in that computation time is generally determined by memory bandwidth. A memory access latency for an off-chip dynamic random access memory (DRAM) may be hundreds to over a thousand clock cycles, and an increased number of cores in a processor design have accentuated the memory bandwidth problem. Recently, progress has been made in three-dimensional integrated circuits (3D ICs) that include two or more layers of active electronic components integrated both vertically and horizontally into a single circuit. The 3D packaging, known as System in Package (SiP) or Chip Stack multi-chip module (MCM), saves space by stacking separate chips in a single package. Components within these layers communicate using on-chip signaling, whether vertically or horizontally. This signaling provides reduced interconnect signal delay over known two-dimensional planar layout circuits.
The manufacturing trends in the above description may lead to gigabytes of integrated memory within a microprocessor package. In some cases, additional on-chip storage may be used as a row-based memory, such as a last-level cache (LLC) before accessing off-chip memory. A reduced miss rate achieved by the additional memory helps hide the latency gap between a processor and its off-chip memory. However, cache access mechanisms for row-based memories may be inefficient for this additional integrated memory. A large tag data array, such as a few hundred megabytes for a multi-gigabyte cache, may be impractical and expensive to place on the microprocessor die.
Increasing the size of a data cache line for the additional integrated memory, such as growing from a 64-byte line to a 4-kilobyte (KB) line, reduces both a number of cache lines in the integrated memory and the size of a corresponding tag. However, dirty bits and coherency information may still be maintained on a granularity of the original cache line size (64-byte line). In addition, data transfers may consume excessive bandwidth as an entire 4 KB line may be accessed when only a few bytes are targeted.
Utilizing DRAM access mechanisms while storing and accessing the tags and data of the additional cache in the integrated DRAM dissipates a lot of power. In addition, these mechanisms consume a lot of bandwidth, especially for a highly associative on-package cache, and consume too much time as the tags and data are read out in a sequential manner. Therefore, the on-package DRAM provides a lot of extra data storage, but cache and DRAM access mechanisms are inefficient.
In view of the above, efficient methods and systems for efficient cache data access in a large row-based memory of a computing system are desired.