A data processing system may opt to duplicate the contents of main memory so that one or more processors may have fast private access to those contents. This technique is commonly known as memory caching. A cache-based system in which there are multiple sources of data modification must handle the problem of the cached data becoming inconsistent with the main store original. Sources of data modification are typically processors or DMA based I/0 devices. The problem, also known as cache coherency enforcement, is that every cache must ultimately see the consequences of all main memory modifications, regardless of their origin. There are software and hardware solutions to this problem. A popular hardware approach is for every cache to watch the actions of every other and to invalidate or update itself as appropriate. The so-called snoopy cache may work with either write-through caches or write-back cache strategies. In the former case, all snooping activity follows stores to main memory. In the latter case, activity typically is triggered by reads from main memory. The activity each cache pursues usually involves first checking whether the data which is being stored or read is held by itself. The check is accomplished by examining either the cache tag directory also known as the tag store, or a duplicated copy of that directory. If there is a match in the examined directory, a variety of actions ensue depending upon the particular cache coherency strategy that is followed by the system.
There are complications to this approach that are handled in the processor. While the overall approach of store through snoopy caches is a well-known technique, there are a number of problems and solutions to the problems, that are peculiar to this implementation.
The first problem is that continually accessing the operand and instruction cache tag stores to make the address collision determination would produce heavy processor performance losses.
A second complication is that the caches in the example system are virtually indexed. As a result a physical address on the bus is insufficient to index into the cache tag stores to make the address collision decision.
A third and very major complication, is that the operation of bus transfer, duplicate tag store lookup, and cache invalidation takes many more cycles than one to complete.
A fourth complication is that it is undesirable to require RAM's in the duplicated tag stores that are multiported or significantly faster than the cycle time. The undesirability reflects itself in availability or cost or both.
A fifth source of complication is the pended read bus protocol. This bus protocol allows writes to occupy the bus between the time of a read address transfer and the return of read data. The concern is that the write could collide with the returning read data rendering the data stale before it is even received.