The present invention relates to computer architectures and in particular to an adaptive cache suitable for use with high-bandwidth memory in a computer cache system.
The processing speed of modern computers is constrained by the time it takes for data to be transferred between computer memory and the computer processor, e.g. “latency”. Such latency can be reduced through the use of cache memories which provide small, fast data storage structures close to the processor. When data is required by the processor, it looks first at the cache memories to see if the necessary data has been previously loaded from a larger but slower main memory. If data is found in the cache memory, the need to access the slower main memory can be avoided.
The success of this strategy relies on the ability to anticipate what data will be required by the processors in the future so that this data may be preloaded into cache memory. Such predictions usually rely on the principle of “locality of reference” meaning that data likely to be used by the processor in the future will be local to the data currently used by the processor. Implementing the strategy requires simply loading blocks of data into the cache around the data currently being used by the processor.
Often the cache memories are constructed of static random-access memory (SRAM) which is generally faster than the dynamic random-access memory (DRAM) used for the larger main memory. Also, typically the cache is relatively small to provide for fast access and as an accommodation to the larger memory cell size of SRAM memory.
High-performance scientific computing (HPC) requires high memory bandwidths particularly when executed on highly parallel architectures like those found in graphic processing units or multi-core processors. Bandwidth refers to the amount of data transmitted per unit time and is distinct from latency which indicates how fast a given piece of data may be accessed. High memory bandwidths may be promoted using special caches constructed of high-bandwidth memory technologies (HBM) using DRAM and new technologies of three-dimensional die stacking in which semiconductor dies holding the memory circuitry are stacked vertically with vertical interconnections through-silicon vias. These HBM memories have sufficient storage capacity to implement high-bandwidth caching but invoke a latency penalty because of the use of DRAM rather than SRAM memory.