1. Field of the Invention
This invention relates to the field of multiprocessor computer systems and, more particularly, to coherency protocols employed within multiprocessor computer systems having shared memory architectures.
2. Description of the Related Art
Multiprocessing computer systems include two or more processors that may be employed to perform computing tasks. A particular computing task may be performed upon one processor while other processors perform unrelated computing tasks. Alternatively, components of a particular computing task may be distributed among multiple processors to decrease the time required to perform the computing task as a whole.
A popular architecture in commercial multiprocessing computer systems is a shared memory architecture in which multiple processors share a common memory. In shared memory multiprocessing systems, a cache hierarchy is typically implemented between the processors and the shared memory. In order to maintain the shared memory model, in which a particular address stores exactly one data value at any given time, shared memory multiprocessing systems employ cache coherency. Generally speaking, an operation is coherent if the effects of the operation upon data stored at a particular memory address are reflected in each copy of the data within the cache hierarchy. For example, when data stored at a particular memory address is updated, the update may be supplied to the caches that are storing copies of the previous data. Alternatively, the copies of the previous data may be invalidated in the caches such that a subsequent access to the particular memory address causes the updated copy to be transferred from main memory.
Shared memory multiprocessing systems generally employ either a snooping cache coherency protocol or a directory based cache coherency protocol. In a system employing a snooping protocol, coherence requests are broadcast to all processors (or cache subsystems) and memory through a totally ordered address network. Each processor “snoops” the requests from other processors and responds accordingly by updating its cache tags and/or providing the data to another processor. For example, when a subsystem having a shared copy of data observes a coherence request for exclusive access to the block, its copy is typically invalidated. Likewise, when a subsystem that currently owns a block of data observes a coherence request to that block, the owning subsystem typically responds by providing the data to the requestor and invalidating its copy, if necessary. By delivering coherence requests in a total order, correct coherence protocol behavior is maintained since all processors and memories observe requests in the same order.
The other standard approach to cache consistency uses a directory-based protocol. In systems that implement a directory-based protocol, both the address network and the data network are typically point-to-point, switched networks. When a processor requests a cache block, the request is sent to a directory which maintains information regarding the processors that have copies of the cache block and their access rights. The directory then forwards the request to those processors which must change their access rights and/or provide data for the request (or if needed, the directory will access the copy of the cache block in memory and provide the data to the requester). Since there is no way of knowing when the request arrives at each processor to which it is sent, all processors that receive the request must typically acknowledge reception by providing data or sending an acknowledge (ACK) message to either the requestor or the directory, depending on the protocol.
Shared memory microprocessing systems often include I/O (input/output) devices or other device types that do not cache data. Because they do not cache data, the interfaces to such devices may be simplified since they need not respond to the various coherence transactions that may be generated, as dictated by the particular coherence protocol. To allow an I/O device to perform a read operation to a cache block, some systems support read stream transactions in which an entire cache block is conveyed from a caching device to the I/O device. Similarly, to allow an I/O device to perform a write operation to a cache block, such systems may further support write stream transactions in which an entire cache block is sent from the I/O device and written within the caching device.
Although a system may allow I/O devices or other non-caching devices to read or write entire cache blocks, such systems typically do not allow an I/O device to obtain a cache block, make partial writes to the blocks and to subsequently write the modified cache block back to memory. Although various performance improvements could be attained, implementation of such functionality could also add significant complexity to the I/O device since the I/O device may be required to respond to foreign coherence transactions while it owns the cache block.