In a typical bus-based multiprocessor system, the bus band-width is exceeded with a small number of processors due to the necessity of the processors to access memory frequently for instructions and data. To overcome this problem, cache memories may be associated with each processor to act as a buffer between the processors and the system bus. Ideally, the cache is "hit" for a majority of the random access memory (RAM) accesses, and the bus is used only a small percentage of the time.
A problem associated with using cache memories in a multiprocessor system is data consistency. If a processor writes to its cache without notifying the other processors' caches, data will become inconsistent. For example, assume the following: processor "1" reads data block "A" from global RAM and then caches it; then that processor writes to that block, updating only its cache; then processor "2" reads that same block "A" out of global RAM and caches it; processor "2" ends up with a different version of "A" than processor "1". Their caches are inconsistent.
Another way caches could become inconsistent would be if two caches held the same data blocks and only one of the processors updated that block in its cache. Again, a cache consistency problem would arise.
There exist several prior art solutions to the problem of cache consistency. Each of these attempts to solve the problem by writing all or some of the "writes" through to global memory. These write-throughs are used to notify the other caches to invalidate like blocks. However, each of these attempted solutions has one or more problems affecting the cache hit ratio and the bus utilization.
The prior art solutions can be grouped into three categories: (1) write-through, (2) global directory, and (3) write-invalidate.
The write-through scheme is the simplest to implement, but it is also the least effective. With this scheme, all "writes" are directed through the cache to global RAM. All other cache controllers monitor the bus (i.e. "snoop") and invalidate their corresponding entry if any "write" hits in their cache. Any data that would become inconsistent with the resulting "write" is invalidated. Hence, the consistency problem is solved. However, since all "writes" are passed through to global RAM, the bus utilization is substantially increased. Since about 15% of all accesses are "writes", and since all "writes" write-through on the bus as misses, the maximum hit ratio for a cache with this type of scheme is around 85%. Assuming hit ratios of around 75% to 85%, and bus access times twice the cache access times, the bus utilizations for this type of scheme would be around 30% to 40% for each processor. Thus, only two to three processors could use this bus before the bandwidth of the bus would be exceeded. This write-through scheme yields low performance due to high bus contention and is not acceptable in a multiprocessor system having four or more processors.
The global directory scheme uses extra bits for each block in main memory and a memory controller to maintain information on how many processors have cached each block. The major problem with this scheme is that, for large global RAM's, it is expensive. Two extra bits are required for each 32-bit block in global RAM, and memory controller hardware is also required. For this extra expense, there is little, if any, performance increase over the "write-through" scheme.
The "write-invalidate" scheme writes-through only to invalidate those blocks which may reside in other caches. One possible implementation of this scheme would be to perform a write-through on the first "write" to invalidate existing like copies, but subsequent "writes" are written only to the cache. A new problem arises with this variation. When the "write" is only to the cache, that block becomes "dirty" (it is different in the cache than in global RAM). If another processor requests that block, the owning processor must either inhibit the RAM and supply that block itself, or it must halt the requesting processor, write the block back to global RAM, and let the requesting processor request that block again. This snoop requirement is tedious, and it uses multiple bus accesses, but it is necessary to maintain cache consistency.
Other "write-invalidate" variations have evolved to reduce the number of times the first write is required. For example, a routine developed at the University of California at Berkeley introduces the concept of "ownership". The Berkeley scheme includes two type of reads ("read-shared" and "read-for-ownership") and two types of writes ("write-for-invalidation" and "write-without-invalidation") to reduce the bus utilization. The problem with the Berkeley routine is that the first "write" will only be eliminated if data is read and cached as private. If the block could possible by shared, the first write would still be required to invalidate potential like blocks in other caches. Furthermore, a custom complier which can determine when data will be shared or when it will be private is required.