1. Field of the Invention
This invention relates to multiprocessor computer architectures and, more specifically, to the sharing or exchanging of information among partitions of a multiprocessor computer system.
2. Background Information
Symmetrical multiprocessor (SMP) computer systems support high performance application processing. Conventional SMP systems include a plurality of interconnected nodes. Each node typically includes one or more processors as well as a portion of system memory. The nodes may be coupled together by a bus or by some other data transfer mechanism. One characteristic of a SMP computer system is that all or substantially all of the system""s memory space is shared among all nodes. That is, the processors of one node can access programs and data stored in the memory portion of another node. The processors of different nodes can also use system memory to communicate with each other by leaving messages and status information in shared memory space.
When a processor accesses (loads or stores to) a shared memory block from its own home node, the reference is referred to as a xe2x80x9clocalxe2x80x9d memory reference. When the reference is to a memory block from a node other than the requesting processor""s own home node, the reference is referred to as a xe2x80x9cremotexe2x80x9d memory reference. Because the latency of a local memory access differs from that of a remote memory accesses, the SMP system is said to have a Non-Uniform Memory Access (NUMA) architecture. Furthermore, if the memory blocks of the memory system are maintained in a coherent state, the system is called a cache coherent, NUMA architecture.
Partitions
The nodes or processors of a SMP computer system can also be divided among a plurality of partitions, increasing the operating flexibility of the SMP system. FIG. 1, for example, is a schematic, block diagram of an SMP computer system 100 comprising a plurality of interconnected nodes 102. Each node 102, moreover, includes a processor unit (P) 104 and a corresponding memory unit (MEM) 106. The nodes 102 have been divided into a plurality of, e.g., four, partitions 108a-d, each comprising four nodes 102. A separate operating system or a separate instance of the same operating system runs on each partition 108a-d. In a partitioned system it is often desirable to permit the processors 104 located in different partitions, e.g., partitions 108a and 108d, to exchange information, e.g., to communicate with each other. To this end, a portion of memory 106 at one or more nodes 102, such as memory portions 110 at each node 102, may be designated as global shared memory. Information or data stored at a global shared memory portion 110 of a first partition, e.g., partition 108a, may be accessed by the processors 104 located within a second partition, e.g., partition 108d. 
Although the use of global shared memory in a partitioned computer system allows the processors to share information across partition boundaries, it can result in errors or faults occurring in one partition causing errors or faults in other partitions. For example, in a cache coherent system, the state, e.g., the ownership, of memory blocks changes in response to reads or writes to those memory blocks. Two processors each located in a different partition and thus each running a different operating system may nonetheless share ownership of a memory block from some portion of global shared memory. A fault or failure in one partition that effects the shared memory block may cause a corresponding fault or failure to occur in the other partition.
To prevent such faults from crossing partition boundaries, the global shared memory can be made non-coherent. However, this approach may result in a partition obtaining stale information from the global shared memory. Specifically, the processor of a first partition may obtain a copy of a memory block from some portion of global shared memory before that memory block has been updated by some other processor. Use of such stale information within the first partition can introduce errors. Another approach to prevent faults from crossing partition boundaries is to move data between partitions through one or more input/output (I/O) devices. With this approach, data from a first partition is read from system memory by an I/O device within the first partition. The I/O device then transfers that data to an I/O device coupled to a second partition, thereby making the data available to the processors of the second partition. This approach also suffers from one or more drawbacks. In particular, the busses coupled to the I/O devices nearly always run at a fraction of the speed of the processor or memory busses. Accordingly, transferring data through multiple I/O devices takes substantial time and may introduce significant latencies.
Accordingly, a need exists for a system that efficiently transfers information between the partitions of a multiprocessor computer system that nonetheless prevents faults in one partition from affecting other partitions.
Briefly, the invention relates to a system and method for moving information between cache coherent memory subsystems of a partitioned multiprocessor computer system that prevents faults in one partition from affecting other partitions. The multiprocessor computer system includes a plurality of processors, memory subsystems and input/output (I/O) subsystems that can be segregated into a plurality of partitions. Each processor may have one or more processor caches for storing information, and each I/O subsystem includes at least one I/O bridge that interfaces between one or more I/O devices and the multiprocessor system. To maintain the coherence of information stored at the memory subsystems and the processor caches, the multiprocessor system may employ a directory based cache coherency protocol. According to the present invention, the I/O bridge has a data mover configured to retrieve information from a xe2x80x9csourcexe2x80x9d partition and store it within the cache coherent system of its own xe2x80x9cdestinationxe2x80x9d partition.
Specifically, when an initiating processor in the source partition wishes to make information, e.g., one or more memory blocks, from a region of global shared memory available to a target processor of a destination partition, the initiating processor preferably issues a write transaction to its I/O bridge. The I/O bridge then notifies the target processor that information in the source partition""s region of global shared memory is ready for copying, preferably by sending the target processor a Message Signaled Interrupt (MSI) containing an encoded message from the initiating processor. The target processor then configures or sets up the data mover in its I/O bridge to perform the transfer. In particular, the target processor provides the data mover with the memory address of the information in the source partition""s global shared memory. The target processor also provides the data mover with the memory address within the destination partition to which the information is to be stored. Once the setup phase is complete, the target processor issues a start command to the data mover. In response, the data mover issues a request to the source partition for a non-coherent copy of the specified information. The home memory subsystem of the source partition preferably responds to the request by sending an xe2x80x9cvalidxe2x80x9d, but non-coherent copy of the specified information, e.g., a xe2x80x9csnapshotxe2x80x9d of the information as of the time of the request, to the data mover in the destination partition. By requesting a non-coherent copy of the information, the data mover in the destination partition does not cause a change of ownership of the respective information to be recorded at the source partition.
The data mover in the destination partition also requests exclusive ownership over the memory block(s) within the destination partition to which the transferred information is to be written. Upon obtaining exclusive ownership, the data mover writes the information received from the source partition to the specified memory block(s) of the destination partition. The data mover may also provide an acknowledgement to the initiating processor at the remote partition. As shown, the specified information is copied from the source partition and entered into the cache coherent domain of the destination partition. Nonetheless, because the transfer was effected without the data mover in the destination partition becoming an owner of the information from the point of view of the source partition, a failure in either the source or destination partition will not affect the other partition.