1. Field of the Invention
This invention relates to the field of multiprocessing computer systems and, more particularly, to performing coherent memory replication within multiprocessing computer systems.
2. Description of the Related Art
Multiprocessing computer systems include two or more processors that may be employed to perform computing tasks. A particular computing task may be performed on one processor while other processors perform unrelated computing tasks. Alternatively, components of a particular computing task may be distributed among multiple processors to decrease the time required to perform the computing task as a whole.
A popular architecture in commercial multiprocessing computer systems is the symmetric multiprocessor (SMP) architecture. Typically, an SMP computer system includes multiple processors connected through a cache hierarchy to a shared bus. The bus provides the processors access to a shared memory. Access to any particular memory location within the memory occurs in a similar amount of time as access to any other particular memory location. Since each location in the memory may be accessed in a uniform manner, this structure is often referred to as a uniform memory architecture (UMA).
Processors are often configured with internal caches, and one or more caches are typically included in the cache hierarchy between the processors and the shared bus in an SMP computer system. Multiple copies of data residing at a particular main memory address may be stored in these caches. In order to maintain the shared memory model, in which a particular address stores exactly one data value at any given time, shared bus computer systems employ cache coherency. An operation is coherent if the effects of the operation upon data stored at a particular memory address are reflected in each copy of the data within the cache hierarchy. For example, when data stored at a particular memory address is updated, the update may be supplied to the caches that are storing copies of the previous data. Alternatively, the copies of the previous data may be invalidated in the caches such that a subsequent access to the particular memory address causes the updated copy to be transferred from main memory. For shared bus systems, a snoop bus protocol is typically employed. Each coherent transaction performed upon the shared bus is examined (or “snooped”) against data in the caches. If a copy of the affected data is found, the state of the cache line containing the data may be updated in response to the coherent transaction.
Unfortunately, shared bus architectures suffer from several drawbacks which limit their usefulness in multiprocessing computer systems. A bus is capable of a peak bandwidth (e.g., a number of bytes/second which may be transferred across the bus). As additional processors are attached to the bus, the bandwidth required to fully supply the processors with data and instructions may exceed the peak bus bandwidth. Since some processors are forced to wait for available bus bandwidth, performance of the computer system suffers when the bandwidth requirements of the processors exceeds available bus bandwidth. Performance may also be adversely affected due to capacitive loading on the shared bus, which increases as more processors are added to the system. Furthermore, as processor performance increases, buses that previously provided sufficient bandwidth for a multiprocessing computer system may be insufficient for a similar computer system employing higher performance processors.
Another structure for multiprocessing computer systems is a distributed shared memory architecture. A distributed shared memory architecture includes multiple nodes, each of which includes one or more processors and one or more memory devices. The multiple nodes communicate via a network. When considered as a whole, the memory included within the multiple nodes forms the shared memory for the computer system. Typically, directories are used to identify which nodes have cached copies of data corresponding to a particular address. Coherency activities may be generated via examination of the directories.
Distributed shared memory systems are scaleable, overcoming the limitations of the shared bus architecture. Since many of the processor accesses are completed within a node, nodes typically have much lower bandwidth requirements upon the network than a shared bus architecture must provide upon its shared bus. The nodes may operate at high clock frequency and bandwidth, accessing the network when needed. Additional nodes may be added to the network without affecting the local bandwidth of the nodes. Instead, only the network bandwidth is affected.
Distributed shared memory systems may employ local and global address spaces. The global address space encompasses memory in more than one node. In contrast, local physical address space may only describe memory within a single node. Accesses to the address space within a node (i.e., access to local physical address space) are typically local transactions, which may not involve activity on the network that couples the nodes. Accesses to portions of the address space not assigned to the requesting node are typically global transactions and may involve activity on the network.
In some distributed shared memory systems, data corresponding to addresses of remote nodes may be copied to a requesting node's shared memory such that future accesses to that data may be performed via local transactions rather than global transactions. In such systems, processors local to the node may access the data using the local physical address assigned to the copied data. Remote processors external to that node may use the global address to access the data. Address translation tables are provided to translate between the global address and the local physical address. Improved systems for implementing address translations between global and local physical addresses are desired.