The present invention relates generally to multi-processor computer systems and more particularly to a cache-flushing system.
High performance, multi-processor computer systems with a large number of microprocessors are built by interconnecting a number of node structures, each node containing a subset of the processors and memory in the system. While the memory in the system is distributed, several of these systems support a shared memory abstraction where all the memory in the system appears as a large memory common to all processors in the system. To support high-performance, these systems typically allow processors to maintain copies of memory data in their local caches. Since multiple processors can cache the same data, these systems must incorporate a cache coherence mechanism to keep the copies coherent.
In some cache-coherent systems, each memory block (typically a portion of memory tens of bytes in size) is assigned a xe2x80x9chome nodexe2x80x9d, which maintains all necessary global information for that memory block, manages the sharing of that memory block, and guarantees its coherence. The home node maintains a directory, which identifies the nodes that possess a copy of the memory block. When a node requires a copy of the memory block, it requests the memory block from its local, private cache. If the data is found, the memory access is resolved locally. Alternatively, a remote memory access may be performed to the home node. The home node supplies the data from memory if its memory has the latest data. If another node has the latest copy of the data, the home node directs this node to forward the data to the requesting node. The data is then stored in the local cache of the requesting node or returned to the home memory and then sent to the requesting node.
In cache-coherent systems, multiple copies of the same memory block can exist in different nodes. These copies must be read-only and identical to the home memory copy. They are called xe2x80x9ccleanxe2x80x9d copies in a xe2x80x9csharedxe2x80x9d state.
When a processor updates its local cache copy, it must ensure that all other copies are invalidated. The processor sends a request to the home memory for the memory block to be owned only by the processor. In response, other processors, which have clean shared copies of the memory block in their caches, must be sent a memory block recall command. Once all processors have responded that the memory block is no longer contained in their caches, the home memory sends a message back to the updating processor that it is now the sole xe2x80x9cownerxe2x80x9d of the memory block. Consequently, the processor has an xe2x80x9cexclusivexe2x80x9d and xe2x80x9cmodifiedxe2x80x9d data copy, which holds the most recent value of the data. The other copies of the memory block are invalid and the copy in the home memory is xe2x80x9cstalexe2x80x9d.
The home node employs a coherence protocol to ensure that when a node writes a new value to the memory block, all other nodes see this latest value. Coherence controllers implement this coherence functionality. First, they implement a coherence controller for each memory unit, which maintains coherence of all memory blocks in that memory unit. Second, the functionality of the coherence controller is integrated with the functionality of the System Control Unit (SCU) of the associated memory unit.
The SCU provides the control and the path for data movement for the following sources and destinations within the node: the processors within the node; the local (node) portion of the memory system; the network connecting all of the nodes of the multi-processor computer system; and the input/output (I/O) system of the local node.
However, a serious problem in the state-of-art cache-coherent shared-memory multiprocessor system designs is that the memory copy is stale after the crash of the owner node. In other words, the most recent value of a memory block is lost when the cache content is irretrievable at a failed owner node.
In many situations, the software may demand a selective cache-flushing scheme in order to define a synchronization point, at which the most recent value of a memory block is reflected at the home memory by flushing the owner cache.
In today""s processor designs, cache flushing is normally implemented as an expensive operation, which may result in wiping out the entire cache rather than the desired cache blocks alone. Although some processors provide selective cache-flushing instructions, there is no guarantee of the correctness unless the cache-flushing instruction has system-wide semantics, which are prohibitively expensive.
Thus, a system has been long sought and long eluded those skilled in the art, which would provide an efficient implementation of transactional memory.
The present invention provides a cache coherent distributed shared memory multiprocessor computer system with programmable selective cache flushing.
The present invention further provides a cache coherent distributed shared memory multi-processor computer system which allows programmers to selectively force write-backs of dirty cache lines to home memory.
The present invention provides a multi-processor computer system which includes a processor with a cache connected thereto, a memory operatively connected to the processor, and a memory controller operatively connected to the memory for controlling access to the memory. The memory controller includes a recall unit operatively connected to the cache. The recall unit includes a triggering mechanism for providing a trigger signal to start a memory recall operation, a recall unit queue mechanism operatively connected to the triggering mechanism, and a control mechanism operatively connected to the recall unit queue mechanism for controlling the recall unit. The memory controller further includes a state machine operatively connected to the recall unit queue mechanism, the cache, and the memory for recalling information from the cache to the memory.
The present invention further provides a method for recalling memory within a cache for use in a multi-processor computer system. The multi-processor computer system includes a processor with the cache connected thereto, a memory operatively connected to the processor, a memory controller operatively connected to the memory for controlling access to the memory. The memory controller includes a recall unit. The method including the steps of: (a) providing to the recall unit addresses of memory locations within the cache that are to be recalled; (b) generating a trigger signal in the recall unit to start memory recall operations; (c) providing to the cache the memory locations within the cache that are to be recalled; (d) providing a response signal to the recall unit as each memory recall operation is completed; and (e) providing an interrupt signal to the processor when all memory recall operations are completed.