The present invention relates to high performance memory system architectures. More specifically, the present invention relates to address predicting apparatus and methods useful in high-performance computing systems.
The speed at which computer processors can execute instructions continues to outpace the ability of computer memory systems to supply instructions and data to the processors. Consequently, many high-performance computing systems provide a high-speed buffer storage unit, commonly called a cache or cache memory, between the working store or memory of the central processing unit (xe2x80x9cCPUxe2x80x9d) and the main memory.
A cache comprises one or more levels of dedicated high-speed memory holding recently accessed data, designed to speed up subsequent access to the same data. For the purposes of the present specification, unless specified otherwise data will refer to any content of memory and may include, for example, instructions, data operated on by instructions, and memory addresses. Cache technology is based on the premise that computer programs frequently reuse the same data. Generally, when data is read from main system memory, a copy of the data is saved in the cache memory, along with an index to the associated main memory. For subsequent data requests, the cache detects whether the data needed has already been stored in the cache. For each data request, if the data is stored in the cache (referred to as a xe2x80x9chitxe2x80x9d), the data is delivered immediately to the processor while any attempt to fetch the data from main memory is not started or aborted if already started. On the other hand, if the requested data is not stored in the cache (referred to as a xe2x80x9cmissxe2x80x9d) then it is fetched from main memory and also saved in the cache for future access.
A level 1 cache (xe2x80x9cL1xe2x80x9d) generally refers to a memory bank built closest to the central processing unit (xe2x80x9cCPUxe2x80x9d) chip, typically on the same chip die. A level 2 cache (xe2x80x9cL2xe2x80x9d) is a secondary staging area that feeds the L1 cache. L2 may be built into the CPU chip, reside on a separate chip in a multichip package module, or be a separate bank of chips.
Address predictors are used to anticipate or predict future addresses in applications such as data prefetching or instruction scheduling. Prefetching systems and methods attempt to reduce memory latency by reducing the probability of a cache miss. The probability of a cache miss is reduced by anticipating or predicting what information will be requested before it is actually requested. Address predictors utilizing correlation prediction tables (xe2x80x9cCPTsxe2x80x9d) for predicting both instruction addresses and data addresses are known.
A simple correlation found in a CPT is a pair consisting of a key and a successor value. The key is used to predict the successor value. A correlated address pair (xe2x80x9cCAPxe2x80x9d) is built by associating two addresses that appear in an address stream. The address that appears earlier in the address stream is referred to as the key and it is paired with the currently referenced address, which is referred to as the successor value. The CAP (that is, a key and its successor value) is then stored in a CPT for later use. When an address previously selected as a key reoccurs in the address stream, it is used to query the CPT to retrieve the corresponding CAP. The successor value in the retrieved CAP is then used to predict the next address in the address stream. In sum, a goal of address predictors is to observe previous address pairs, store them, and use them as predictions in the future.
Correlations in a CPT can be built from cache miss addresses and used to predict future miss addresses. For example, when a cache miss is generated, a correlation is built between the preceding cache miss address and the current cache miss address. The next time that the key address generates a cache miss, the predictor speculates that the successor address will be the next cache miss.
One approach to improving the performance of a CPT is to implement the CPT with the ability to correlate a single key with multiple successor values. This can be achieved by entering a key with multiple successor values for each key entered. That is, each unique key is stored or entered into the CPT only once. If the key associated with a subsequent correlation to be entered is already entered into the CPT, then the successor value is entered into the same line as the already entered key, but the associated key is not entered again. Generally, when a key is correlated with more than one successor value, the successors are predicted using a most recently used (xe2x80x9cMRUxe2x80x9d) priority and replaced with a least recently used (xe2x80x9cLRUxe2x80x9d) priority.
However, this approach to improving the performance of a CPT is not without drawbacks. First, because the total table size is constant, a CPT designer is faced with undesirable tradeoffs between the maximum possible number of successors per entered key and the maximum possible number of keys entered. The maximum number of entered successors per entered key is static and cannot be changed depending on the application. Thus, a CPT designer must decide apriori how many successors can be associated with a given entered key. However, this decision can be difficult. Some addresses are highly correlated and need only one successor for their correlations. Other addresses may be followed by several different addresses at various points during program/application execution and benefit from multiple successors. Since the total size of a CPT is constant, increasing the number of entered successors per key requires that the total number of entered keys be decreased.
The second drawback to this approach relates to replacing correlations previously entered into a CPT. When a correlation is replaced, all correlations corresponding to the same key are lost, including the key and all corresponding successor values stored. The removed correlations can only be reentered into the CPT if and when the correlations reappear in the cache miss addresses. This rebuilding of a previously removed correlations can have a severe impact on the performance of the CPT.
The speed at which computer processors can execute instructions will likely continue to outpace the ability of computer memory systems to supply instructions and data to the processors. Although address predictors can be utilized to improve the performance of computer memory systems, existing address predictors have some drawbacks. Accordingly, there is still a need in the industry to improve memory system performance in computer systems by improving existing address predictors and address prediction methods.