Under ideal conditions, increased processor speed translates into an increased demand on memory per unit time. Processing elements often are capable of operating at rates that exceed those of dynamic random access memories (DRAM) which are most often used for primary storage. When system performance is paramount, an additional level of memory hierarchy, called a "cache", whose performance is matched to that of the associated processor is incorporated.
Caches are small (when compared to the size of the primary storage), fast localized memory arrays which supply data at rates which do not impede the associated processor's performance. The viability of caches are based on the probabilities that programs tend to reference a particular piece of data many times before moving onto another (temporal locality), and that successive references made tend to be relatively close to the previous reference made (spatial locality). Cache designers capitalize on this by using the knowledge of the recent past (i.e., previous data reference patterns) to predict the near future in that they collect data relating to references made recently for subsequent use.
In processing systems which employ data caches, every cache reference made must be validated. Every cache entry must have associated with it a "tag" which describes which data in main memory is currently being represented by it and a "valid" bit which indicates the validity of the cache entry itself. A "cache hit" occurs when the desired data resides in the cache while a "cache miss" occurs when it is not. The actual transfer of data between the processor and the cache is often conditioned by the results of the cache tag lookup. This tends to increase effective cache access times which often leads to a proportional increased processor cycle times or system performance degradation.
Minimizing cache store latency is particularly challenging in that the processor must often stall until the cache access is validated since it is the source of the data. Additionally, all cache data modifications must be reflected in the state of the cache tag to maintain cache consistency.
A common cache store policy is to condition the cache update on the results of the cache tag lookup. When a cache hit occurs, the cache modification may then proceed. If a cache miss occurs, one of two actions may take place, cache write bypass or cache write allocation. Cache write bypass is not a general solution since this is not usable for virtual cache designs, but it does not achieve single cycle cache access since no cache modification takes place. Simple cache write allocation would take two cache cycles: cache tag read and cache data/tag write. The serialization of the cache tag lookup to cache data modify increases the effective cache store access time by a factor of two, one cache tag read and one cache data nd optionally tag write cycle. Since the processor is the source of the data, it must often stall until the cache tag check is completed.
Another approach which avoids this serialization penalty completely ignores the state of the cache tag. With this approach, every cache store operation bypasses the cache. To maintain cache consistency, the modified cache entry is invalidated without regard. The processor need only supply data for one cache (data) write cycle. This approach reduces the effective store cache access time, but not without significant losses in cache performance since these invalidations may purge useful data from the cache.