1. Field of the Invention
The present invention relates to techniques for improving the performance of computer systems. More specifically, the present invention relates to a method and an apparatus for performing speculative writestream transactions in a computer system.
2. Related Art
Modem multiprocessing computer systems often include two or more processors (or processor cores) that are used to perform computing tasks. One common architecture in multiprocessing systems is a shared memory architecture in which multiple processors share a common memory. In shared memory systems, a cache hierarchy is typically implemented between the processors and a shared memory, wherein each processor can hold a cached copy of a given cache line. Because the cached copies of cache lines may be modified by the caching processor, shared memory multiprocessing systems use cache coherence protocols to ensure that any copies of the cache line in the cache hierarchy contain the same data value.
One common variant of shared memory systems is a distributed shared memory architecture, which includes multiple distributed “nodes” within which separate processors and memory reside. Each of the nodes is coupled to a network that is used to communicate with the other nodes. When considered as a whole, the memory included within each of the multiple nodes forms the shared memory for the computer system. Unfortunately, an access to memory stored in a remote node is significantly slower than an access to a memory in a local node. As a consequence, cache line write operations may suffer from severe performance degradation in a distributed shared memory system. This performance degradation occurs because if a cache line write operation is performed by a processor in a node that does not have write permission for the cache line, the write operation will be stalled until write permission can be acquired for the cache line.
To address the above-described problem, some coherence protocols include a “writestream” transaction that enables a processor to write an entire cache line to memory without receiving the previous contents of the cache line or retaining a copy of the cache line in the processor's cache. In these systems, because the previous contents of the cache line are not needed, the previous contents of the cache line are discarded. Consequently, when a processor initiates such a transaction, the processor must commit to carrying through with the transaction and writing the entire cache line to memory. In addition, for some writestream transactions, the system requires strong ordering semantics, meaning that the writestream transactions must complete in order. Because writestream transactions must occur in order, the possibility of deadlocks arises where multiple processors are initiating writestream transactions involving the same set of cache lines. Specifically, a first processor's writestream transaction for cache line A can be blocked by a second processor's writestream transaction for cache line A while the second processor's writestream transaction for cache line B is blocked by the first processor's writestream transaction for cache line B. Because the processors prevent each other from continuing with their transactions, neither processor makes forward progress and deadlock occurs.
Moreover, many multiprocessing systems support pipelining for writes to memory. However, unlike writestream transactions that use weakly ordered semantics, strongly-ordered writestream transactions must be completed in order. Therefore strongly-ordered writestream transactions cannot be pipelined, which means that these transactions cannot benefit from the performance advantage pipelining.