1. Field of the Invention
The present invention generally relates to computer systems having multiprocessor architectures and, more particularly, to a novel multi-processor computer system for processing memory accesses requests.
2. Description of the Prior Art
To achieve high performance computing, multiple individual processors have been interconnected to form multiprocessor computer systems capable of parallel processing. Multiple processors can be placed on a single chip, or several chips—each containing one or several processors—interconnected into a multiprocessor computer system.
Processors in a multiprocessor computer system use private cache memories because of their short access time (a cache is local to a processor and provides fast access to data) and to reduce the number of memory requests to the main memory. However, managing caches in a multiprocessor system is complex. Multiple private caches introduce the multi-cache coherency problem (or stale data problem) due to multiple copies of main memory data that can concurrently exist in the multiprocessor system.
Small-scale shared memory multiprocessing systems have processors (or groups thereof) interconnected by a single bus. However, with the increasing speed of processors, the feasible number of processors that can share the bus effectively decreases.
The protocols that maintain the coherence between multiple processors are called cache coherence protocols. Cache coherence protocols track any sharing of data blocks between the processors. Depending upon how data sharing is tracked, cache coherence protocols can be grouped into two classes: directory based and snooping.
In a multiprocessor system with coherent cache memory, consistency is maintained by a coherence protocol that generally relies on coherence events sent between caches. A common hardware coherence protocol is based on invalidations. In this protocol, any number of caches can include a read-only line, but these copies must be destroyed when any processor stores to the line. To do this, the cache corresponding to the storing processor sends invalidations to all the other caches before storing the new data into the line. If the caches are write-through, then the store also goes to main memory where all caches can see the new data. Otherwise, a more complicated protocol is required when some other cache reads the line with the new data.
In a cache-coherent multiprocessor system, there may be bursts of activity that cause coherence actions, such as invalidations, to arrive at a cache faster than the cache can process them. In this case, they are generally stored in first-in, first-out (FIFO) queues, thereby absorbing the burst of activity. As known, FIFO queues are a very common structure used in computer systems. They are used to store information that must wait, commonly because the destination of the information is busy. For example, requests to utilize a shared resource often wait in FIFO queues until the resource becomes available. Another example is packet-switched networks, where packets often wait in FIFO queues until a link they need becomes available.
A common operation in a multiprocessor is memory synchronization, which insures that all memory accesses and their related coherence protocol events started before some point in time have completed. For example, memory synchronization can be used before initiating a DMA transfer of data prepared in memory. The synchronization insures that the memory is completely consistent before the DMA transfer begins.
Before a multiprocessor memory synchronization can complete, all coherence protocol events that were initiated prior to the synchronization must be processed. Some of these events could be stored in FIFO queues in the coherence logic of the multiprocessor. One way to make sure all such events have been processed is to drain all of the FIFO queues before completing the memory synchronization. However, this is inefficient because coherence events that arrived after the memory synchronization began are unnecessarily processed, causing a delay in the completion of the synchronization. A second problem with this approach is that processors must be prevented from generating new coherence actions or else the queues will continue to fill, potentially causing a livelock. Stopping all of the processors is necessary for the complete draining approach, but inefficient.
What is needed is a mechanism for tracking queue entries that existed prior to the memory synchronization, and completing the synchronization when those entries have been processed. Ideally, the memory system should be allowed to continue generating new coherence protocol events while the events prior to the synchronization are draining.
It would thus be highly desirable to provide a system and method for tracking queue entries that existed prior to the memory synchronization, and completing the synchronization when those entries have been processed.
Further, it would be desirable to provide a system and method for tracking queue entries wherein the memory system is allowed to continue generating new coherence protocol events while the events prior to the synchronization are draining.