1. Field of the Invention
The present invention relates to memory models for microprocessor control. More particularly, the present invention relates to relaxing the first-in-first-out (FIFO) constraint on the processing of all snoop requests received from the system bus. Some concurrent processing of snoop requests is possible through the separation of locally generated requests from remotely generated requests, allowing selected requests to proceed without strict order constraints.
2. The Prior Art
In computer systems, memory access times often limit the throughput of the system. Relative to current processor speeds, data access times for the main memory can be quite long. One scheme to minimize this data access time limitation is to store some of the more frequently-used data in a location that is more quickly accessible to the processor than is the main memory. For example, in systems with multiple processors, a cache memory associated with each of the processors is used to store copies of certain data so that the data can be accessed more quickly than from the main memory. Cache memory generally has faster access times than does the main memory.
Unfortunately, cache memory systems are not without their own problems. Because the use of cache memory involves creating a copy of some of the main memory data, multiple copies of the same data may exist in different locations within the multi-processor computer system. When one copy of the data is changed, multiple copies of the same data held in other locations must also be updated. Data errors will occur within the multi-processor system if different processors within the system are operating (performing reads and writes) on such inconsistent copies of data. This problem is known as a cache consistency or cache coherence problem. To avoid this problem, a common solution is to maintain a total order for all memory accesses with the help of cache consistency protocols and hardware.
Hardware-based solutions to the cache coherence problem generally follow either a centralized or distributed approach. In a centralized approach, directory protocols maintain information about where copies of information reside in a centralized directory. The directory contains information about the contents of local caches for the entire multi-processor system. A centralized controller keeps this information up to date and interacts with all of the local caches to ensure that data consistency is maintained.
In a distributed approach, "snoopy" protocols distribute the responsibility for maintaining cache coherence among all of the processors. The updates each processor makes to a shared memory block must be broadcast to all other processors. Each cache controller "snoops", or reads, these broadcast messages and updates its own cache accordingly.
In the "snoopy" system, each individual processor and its cache is connected to a shared system bus that is connected to the shared main memory. As data operations are performed in each processor, the processor will broadcast these operations onto the shared system bus. For example, as a first processor performs read and write operations on shared data copies located in its cache, it broadcasts this information to the system bus to alert other processors to update the status of their data copies. By "snooping" the system bus, a second processor knows that it must invalidate its copy of a piece of data after it receives the broadcast that the first processor has operated on that same piece of data. Other examples of the messages broadcast by processors onto the shared system bus are well known to those of ordinary skill in the art.
In the asynchronous cache system 10 shown in FIG. 1, each processor 12-1 through 12-n has an associated cache 14-1 through 14-n and an associated first-in-first-out (FIFO) snoop buffer 16-1 through 16-n. Each snoop buffer 16-1 through 16-n is responsible for storing the snoop broadcasts received from the system bus 18 until they can be processed by each individual processor 12-1 through 12-n.
When a cache coherence transaction such as an invalidation is broadcast on the system bus 18, the invalidation request is buffered in each individual snoop buffer 16-1 through 16-n. The data selected for invalidation located in a cache 14 is not invalidated immediately. Instead, the broadcast messages are propagated separately following a FIFO order through each snoop buffer 16 to its associated cache 14. Accordingly, a copy of the data found in a cache 14 is invalidated independently or "asynchronously" of the invalidations occurring in other processor caches.
Total Store Ordering (TSO) and Sequential Consistency (SC) are two well-known techniques for specifying the order in which memory accesses are performed. SC and TSO do not allow any processors in the system to read two different write updates as having occurred in a different order. SC is a stronger model than TSO because SC also does not allow a processor to read and return the value of a write while the write is still pending in the buffer.
As an example of the type of errors TSO and SC prevent, suppose that the initial value of location X in the cache of a first processor is 0 and the initial value of location Y in the cache of a second processor is also 0. The first processor attempts to update location X to a value of 1 and the second processor attempts to update location Y to a value of 1. An illegal result occurs if a third processor reads X as 1 and Y as 0, while a fourth processor reads X as 0 and Y as 1. The third and fourth processors must read these two operations as having occurred in the same order.
It was normally believed that requests received from the system bus needed to be processed by the system processors in the order in which they were received according to the prior art constraints. If the processor has a hierarchical cache system with multiple levels of cache associated with each processor, a FIFO path from the system bus through each level of cache must be maintained. Thus, the system bus behaves as a reference point at which memory accesses are ordered. It provides the necessary information to guarantee that all processors will observe the same sequence of events, and it is impossible to observe two writes, for example, in different orders.
Maintaining this strict FIFO order does create a memory access time penalty. If a strict FIFO order is maintained on all coherence transactions, a pending request that may have a long memory access latency, for example a request to access the shared main memory which will take longer than a cache access, will penalize other transactions further down in the FIFO store path. The processor will pause and wait for the pending request to complete, wasting processor time that could be used to execute other requests if the FIFO restraint could be relaxed. Accordingly, a method for relaxing this FIFO constraint would be desirable. Such a method could reduce the amount of time an individual processor is idle while waiting for a request to complete, and thus reduce overall execution time for the multi-processor computer system.