Most general purpose computer systems are built around a general-purpose processor, which is typically an integrated circuit operable to perform a wide variety of operations useful for executing a wide variety of software. The processor is able to perform a fixed set of instructions, which collectively are known as the instruction set for the processor. Instructions and data are stored in memory, which the processor can selectively read and write.
In more sophisticated computer systems, multiple processors are used, and one or more processors runs software that is operable to assign tasks to other processors or to split up a task so that it can be worked on by multiple processors at the same time. In such systems, the data being worked on is typically stored in a volatile memory that can be centralized or split up among the different processors working on a task.
Volatile memory, such as the dynamic random access memory (DRAM) most commonly found in personal computers, is able to store data such that it can be read or written much more quickly than the same data could be accessed using nonvolatile storage such as a hard disk drive or flash nonvolatile memory. Volatile memory loses its content when power is cut off, so while it is generally not useful for long-term storage it is generally used for temporary storage of data while a computer is running.
A typical random-access memory consists of an array of transistors or switches coupled to capacitors, where the transistors are used to switch a capacitor into or out of a circuit for reading or writing a value stored in the capacitive element. These storage bits are typically arranged in an array of rows and columns, and are accessed by specifying a memory address that contains or is decoded to find the row and column of the memory bit to be accessed.
The memory in a computer usually takes the form of a network of such circuit elements formed on an integrated circuit, or chip. Several integrated circuits are typically mounted to a single small printed circuit board to form a memory module, and the modules in multiprocessor computers can be either centralized such that the various processors or nodes in the system have relatively uniform access to the memory, or can be distributed among the nodes.
When the memory is local to a processor or node that is accessing the memory, the delay in accessing the memory is a significant performance limitation, as it can take tens or even hundreds of processor clock cycles to retrieve data. When the memory is distributed among nodes, the speed at which the memory can be accessed is often orders of magnitude longer, as messages must be passed between nodes on an interconnect network linking the nodes. Management of memory requests to other nodes in a multiprocessor computer system is therefore a significant consideration in designing a fast and efficient multiprocessor computer system.