A typical uniprocessor computer system includes a processor and an associated cache memory that stores a subset of the information stored by the system memory. The cache memory acts as a high speed source of information for instructions to be executed by the processor. When a processor requests to read information that is not stored in the cache memory, a "cache miss" occurs, and the cache must be refilled with information fetched from system memory. The processor is typically stalled while the information is fetched from system memory, and the time required to fill the cache after a cache miss greatly affects the system latency of a uniprocessor computer system.
Typical multiprocessor computer systems include multiple processors each having an associated cache memory. Cache misses in a multiprocessor system are complicated by the fact that the most recent copy of the requested data may reside in another cache rather than in system memory. A cache coherence protocol is often implemented to track where the most recent copy of cached information is currently located. Typically, each processor independently maintains a state for its cache entries, and when another processor requests data from system memory to fill its cache, each of the other processors determines whether it, instead of system memory, should source the data.
A typical prior mechanism for maintaining cache coherence in a multiprocessor computer system is a globally shared address bus to which the processors and the memory subsystem are coupled. Each processor "snoops" the memory address that is driven on the address bus to determine whether its cache should source the requested data. The memory subsystem typically queues the request. A processor indicates that its cache is to source the requested data by asserting a shared "ownership" line, and the memory subsystem flushes the request from its queue before initiating the memory access request if a processor asserts the ownership line. Common interconnects that include a globally shared address bus are typically optimized for high bandwidth and throughput at the expense of an increase in latency.
As computer systems and computer system components become faster and more complex, increasing the efficiency of the common interconnect, in terms of both physical implementation and resource allocation, becomes a paramount concern for system designers. Increasing the efficiency of the common interconnect for use in a cache coherent multiprocessor computer system may result in a number of architectural changes such that the time required to fill a cache after a cache miss may become important to the system latency of the computer system. The time required to fill a cache after a cache miss is particularly critical to the unloaded system latency wherein no memory access requests are queued up ahead of the cache fill request.
Ideally, the unloaded system latency should be of the order of the latency of the Dynamic Random Access Memory ("DRAM") devices that comprise the system memory. Therefore, cache coherence operations and memory access requests should be completed within the time allotted for servicing a memory access request, and memory accesses should be initiated as quickly as possible. However, the physical implementation of the common interconnect may make it difficult to quickly initiate system memory accesses. For example, an address bus may be multiplexed such that two or more bus cycles are required to convey an entire transaction request packet, which includes the memory address of the memory location to be accessed. Therefore, a mechanism that quickly initiates memory accesses when a memory address is conveyed over multiple bus cycles is needed to reduce unloaded system latency.