1. Field of the Invention
The present invention relates to the design of computer systems. More specifically, the present invention relates to a method and an apparatus for pipelining cache coherence operations in a shared memory multiprocessor system.
2. Related Art
Modern computing systems often include multiple processors to provide increased throughput. Individual processors that are part of these multiprocessor systems can synchronize their operation and can share data through read and write operations to memory locations in a common address space that is shared between all of the processors. To provide low-latency and high-bandwidth data accesses, the processors make use of caches associated with individual processors to keep local copies of data in the common address space.
Since individual processors may contain duplicate copies of the same set of memory locations in their caches, it is important to keep the caches coherent, so that when an item in one cache is modified by a processor, the other caches in the other processors are automatically updated to accurately reflect the same, shared state.
Cache coherence can be facilitated by “snooping” across a global bus that connects the caches of the processors to main memory through a bridge chip. FIG. 1 illustrates such a multiprocessor computing system including a bus 120 that facilitates cache coherence operations. As illustrated in FIG. 1, processors 102 and 104 include CPUs 106 and 110 and caches 108 and 112, respectively. Processors 102 and 104 and bridge 114 are coupled together through bus 120. Bridge 114 is itself coupled to main memory 116 and I/O channel 118.
Bridge 114 includes the memory controller, as well as logic required to present I/O transactions on bus 120 for the purpose of snooping. Each processor 102 and 104 and its cache 108 and 112 is responsible for snooping memory transactions that transpire on bus 120. Processors 102 and 104 use the snooped data to keep their caches coherent with the other caches within the other processors.
While this method of keeping caches coherent has historically been effective in many situations, as processor speeds continue to increase, global bus 120 is rapidly becoming a bottleneck to system performance. In order to remedy this problem, the single global coherence bus 120 can be replaced with unidirectional point-to-point communication paths from each processor to the bridge chip. Coupling the processors to the bridge chip in this way can potentially increase the access speed to main memory. However, coupling the processors to the bridge chip through point-to-point links does not eliminate the bottleneck associated with snooping because each snoop transaction must run to completion prior to starting the next snoop transaction.
What is needed is a method and an apparatus that facilitates snooping in a shared memory multiprocessor system without the problems described above.