1. Field of the Invention
This invention relates to the field of multiprocessor computer systems and, more particularly, to coherency protocols employed within multiprocessor computer systems having shared memory architectures.
2. Description of the Related Art
Multiprocessing computer systems include two or more processors that may be employed to perform computing tasks. A particular computing task may be performed upon one processor while other processors perform unrelated computing tasks. Alternatively, components of a particular computing task may be distributed among multiple processors to decrease the time required to perform the computing task as a whole.
One popular architecture in multiprocessing computer systems is a shared memory architecture in which multiple processors share a common memory. In shared memory multiprocessing systems, a cache hierarchy is typically implemented between the processors and the shared memory. In order to maintain the shared memory model in which a particular address stores exactly one data value at any given time, shared memory multiprocessing systems employ cache coherency protocols. Generally speaking, an operation is coherent if the effects of the operation upon data stored at a particular memory address are reflected in each copy of the data within the cache hierarchy. For example, when data stored at a particular memory address is updated, the update may be supplied to the caches that are storing copies of the previous data. Alternatively, the copies of the previous data may be invalidated in the caches such that a subsequent access to the particular memory address causes the updated copy to be transferred from main memory or from a cache.
Shared memory multiprocessing systems may generally employ a broadcast snooping cache coherency protocol or directory based cache coherency protocol. In a system employing a snooping broadcast protocol (referred to herein as a “broadcast” protocol), coherence requests are broadcast to all processors (or cache subsystems) and memory through a totally ordered address network. Each processor “snoops” the requests from other processors and responds accordingly by updating its cache tags and/or providing the data to another processor. For example, when a subsystem having a shared copy observes a coherence request for exclusive access to the coherency unit, its copy is typically invalidated. Likewise, when a subsystem that currently owns a coherency unit observes a coherence request for that coherency unit, the owning subsystem typically responds by providing the data to the requestor and invalidating its copy, if necessary. By delivering coherence requests in a total order, correct coherence protocol behavior is maintained since all processors and memories observe requests in the same order.
In contrast, systems employing directory-based protocols maintain a directory containing information indicating the existence of cached copies of data. Rather than unconditionally broadcasting coherence requests, a coherence request is typically conveyed through a point-to-point network to the directory and, depending upon the information contained in the directory, subsequent coherence requests are sent to those subsystems that may contain cached copies of the data in order to cause specific coherency actions. For example, the directory may contain information indicating that various subsystems contain shared copies of the data. In response to a coherence request for exclusive access to a coherency unit, invalidation requests may be conveyed to the sharing subsystems. The directory may also contain information indicating subsystems that currently own particular coherency units. Accordingly, subsequent coherence requests may additionally include coherence requests that cause an owning subsystem to convey data to a requesting subsystem. In some directory based coherency protocols, specifically sequenced invalidation and/or acknowledgment messages may be required. Numerous variations of directory based cache coherency protocols are well known.
One type of shared memory system which utilizes directories is a distributed shared memory architecture. A distributed shared memory architecture includes multiple nodes within which processors and memory reside. Each of the multiple nodes is coupled to a network through which they communicate. When considered as a whole, the memory included within the multiple nodes forms the shared memory for the computer system. Typically, directories are used to identify which nodes have cached copies of data corresponding to a particular address and coherency activities may be generated via examination of the directories. Unfortunately, processor access to memory stored in a remote node (i.e. a node other than the node containing the processor) is generally significantly slower than access to memory within the node. In particular, write operations may suffer from severe performance degradation in a distributed shared memory system. If a write operation is performed by a processor in a particular node and the particular node does not have write permission to the coherency unit affected by the write operation, then the write operation is typically stalled until write permission is acquired from the remainder of the system.
In view of the above, some protocols include a transaction that allows a processor to write an entire coherency unit to memory without receiving the previous contents of the coherency unit or retaining a copy of the coherency unit in its cache (e.g., a “writestream” transaction). However, because the previous contents of the cache line are not needed, the previous contents of the cache line are discarded. Consequently, when a processor initiates such a write transaction, the processor must commit to carrying through with the transaction and writing the entire coherency unit. However, many processing systems are configured to perform speculative transactions, and some systems may be configured to pipeline requests with no guarantee that transactions will be handled in the order in which they are requested. Because processors must commit to performing these types of transactions once initiated, the possibility of deadlock situations may arise where multiple processors are contending for the same resources.
Accordingly, an effective method and mechanism for supporting speculative writestream transactions in a shared memory computing system is desired.