Computer systems typically include a processor, a random access memory (RAM), and a number of peripheral devices. The processor, RAM, and peripheral devices communicate with each other using one or more busses.
Significant latencies may occur when a processor accesses a memory across the bus. The processor must contend with other devices sharing the bus and transport over the bus is relatively slow. In order to improve computer system performance, the processor is provided with on-chip cache memories that store local copies of data or instructions. Such on-chip memories greatly improve processor execution times, however, the state of the cache must constantly be monitored to determine if external memory access is required. For example, significant delays for external memory access may be incurred in the event of a cache miss or when flushing the cache.
In a multiprocessor environment, significant overhead can be required to ensure synchronization of the on-chip cache memory of each processor with external memory. In addition to the undesirable amount of semiconductor die space consumed by the cache and overhead, some applications simply cannot tolerate external memory access latencies for flushing or filling the individual caches of the plurality of processors.