A cache is a mechanism where the parameters and results of previous queries are kept and may be rapidly checked against a current query to avoid the costs of performing the full query mechanism. If the parameters of the current query are found to match those of a saved query then the answer may be provided out of the cache, saving time.
Each combination of parameters and answer stored in a cache is called an “element” or a cell or a slot. Cache elements can be expensive in size and lookup cost. This is because they not only have to save the answer, but they also have to retain the parameters to uniquely and safely ensure the new query is a correct match. In general the size of a cache element includes the size of arguments, the size of an answer, plus any machinery necessary to confirm the match of the current parameters to the saved parameters.
Caches become a “hot spot,” an area of memory and code which is always in use since even if a lookup fails, it will be attempted and this keeps the cache in constant use. The size and time cost of a cache therefore competes aggressively for the most current resources. This means there is always pressure to minimize the size and time cost of a cache while yet yielding as little as possible of its effectiveness at retaining useful results and maximizing the percentage of successful matches (the “hit rate”).
Caches need a search mechanism. This is how the incoming parameters are converted to a prescription for which elements of the cache to search. In an ideal cache, the search is “content addressable” or “fully associative.” This means that all cache elements are searched. The search may be parallel, by broadcasting the current parameters to every element and including a copy of the machinery to match against prior parameters in every element. The search may be serial, checking each element against the current parameters using a single copy of the comparison machinery. Or it may be some combination, taking N elements (where N is generally some divisor of the total number of elements) simultaneously through N copies of the comparison machinery (“N-way”lookup).
Since comparison machines are generally large and consume power the use of completely parallel search prevails only either in caches with small numbers of elements, or in moderately sized caches which can somehow optimize the size and power of the many copies of comparison engine in return for greater specificity (for example by using bit-serial comparison). It is most common for an N-way comparison mechanism to be used (including N=1) since this is a fairly general and flexible way to make the engineering tradeoffs. A cache with N elements all simultaneously searched is called an N-way set associative cache, and the N-elements which are chosen for the comparison are called a “set” of elements.
Hashing is a part of a lookup technique commonly used for N-way designs. A hash is a function which seeks to fold and obfuscate a set of parameters to result in a small number, the hash, which is likely to be distinct for distinct sets of parameters. In general if there are M sets of elements, then the hash will be a number from 0 to M−1, thus having M distinct values. The parameters will map down to one of these M values, and we know that the match, if found, will be in the set identified by that value. The cache then contains M*N elements. When N=1 such an arrangement is called “direct mapped” because the hash function maps directly onto the only element which will be considered.
Replacement is one of the most subtle and important problems in cache design. The issue here is that a cache generally has many fewer elements than there are distinct queries. The cache still has a benefit because in many situations there are some queries which are much more common than others. In general, the distribution of queries changes over time, and so the contents of the cache must also change over time in order to track what is most likely to be asked. This leads to the problem of when to put new values into the elements: when to replace an existing element's content with something which is expected to be a more useful content.
The ideal replacement decisions are usually not practical, since it would involve knowing what is to come in the future. Conversely, the most conservative replacement strategy, no replacement, is practical only in situations where a prior study can choose a good set of cache content which will remain valid forever. Usually, however, a cache does adapt and it uses recent history to predict the future. In particular the most common algorithms for replacement all start by putting any newly seen and qualified (filtered to exclude strange queries, if such a filter is known) into the cache. Since the cache has a fixed number of elements that implies also choosing one to remove.
Replacement strategy is closely tied to “set” organization. This is because if a hash selects an N-way set then the new content must go into that set, and so also the discard must come from that set. Choice of replacement is usually made by some algorithm that attempts to distill out of recent set history a guide to which element has been least active and is thus deemed least likely to be useful.
It is important to consider some of the tradeoffs in cache design. Suppose that a cache of size P elements could be factored in various ways, for example as a P*1 direct map, or as an M*N set mapping. The direct map has the advantage that the hash function will have the largest number of distinct values, and the comparison mechanism will require only one copy, and only one element needs to move to the comparison mechanism. This makes a direct cache fast, but it has a flaw—replacement. Since the set size is 1, you have no flexibility in what you replace. The Achilles heel of direct mapped sets is hot conflicts, where two currently active queries happen to map to the same hash number and therefore keep tossing each other out of the cache, leading to very poor worst-case performance. In some applications a few such conflicts present involving frequent queries can cause more cache misses than all the other queries put together.
The choice of discard is usually the biggest part of the design of a replacement strategy. That is because the qualification, hash and N-way choices determine the insertion of new content, but there are many ways of deciding which elements are most worthy to discard. The best known generic algorithm is the Least Recently Used (LRU) mechanism which research has repeatedly shown to be a good all around performer especially if the set size is large. Unfortunately as the set size (N) becomes large it also becomes increasingly difficult to implement. A 4-way LRU can be done with a manageable 5 bits of state per set and a 24 element transition lookup table, but 6-way LRU requires 10 bits and 720 entries (rarely practical), and 8-way is simply impractical at 16 bits and 40,320 entries. Yet, a set size of 6 is really not that great for assuring that LRU performance will approximate ideal. Usually set sizes larger than 4 give up on LRU and adopt some other strategy such as circular replacement.
There are many other considerations. However, as general background, this is sufficient introduction to the issues of size, speed, complexity, power, organization, and replacement strategies which must be balanced in cache design. It is with respect to these considerations and others that the present invention has been made.