Multiprocessor computers by definition contain multiple processors that can execute multiple parts of a computer program and/or multiple distinct programs simultaneously, in a manner known as parallel computing. In general, multiprocessor computers execute multithreaded-programs and/or single-threaded programs faster than conventional single processor computers, such as personal computers (PCs), that must execute programs sequentially. The actual performance advantage is a function of a number of factors, including the degree to which parts of a multithreaded-program and/or multiple distinct programs can be executed in parallel and the architecture of the particular multiprocessor computer at hand.
Multiprocessor computers may be classified by how they share information among the processors. Shared-memory multiprocessor computers offer a common physical memory address space that all processors can access. Multiple processes and/or multiple threads within the same process can communicate through shared variables in memory that allow them to read or write to the same memory location in the computer. Message passing multiprocessor computers, in contrast, have a separate memory space for each processor, requiring processes in such a system to communicate through explicit messages to each other.
Shared-memory multiprocessor computers may further be classified by how the memory is physically organized. In distributed shared-memory computers, the memory is divided into modules physically placed near each processor. Although all of the memory modules are globally accessible, a processor can access memory placed nearby faster than memory placed remotely. Because the memory access time differs based on memory location, distributed shared memory systems are often called non-uniform memory access (NUMA) machines. By contrast, in centralized shared-memory computers, the memory is physically in one location. Centralized shared-memory computers are called uniform memory access (UMA) machines because the memory is equidistant in time from each of the processors. Both forms of memory organization typically use high-speed cache memory in conjunction with main memory to reduce execution time.
Multiprocessor computers with distributed shared memory are often organized into multiple nodes with one or more processors per node. The nodes interface with each other through a memory-interconnect network by using a protocol, such as the protocol described in the Scalable Coherent Interface (SCI)(IEEE 1596). UMA machines typically use a bus for interconnecting all of the processors.
Further information on multiprocessor computer systems in general and NUMA machines in particular can be found in a number of works including Computer Architecture: A Quantitative Approach (2nd Ed. 1996), by D. Patterson and J. Hennessy, which is hereby incorporated by reference.
NUMA machines offer significant advantages over UMA machines in terms of bandwidth, but they have the drawback of increased delay when a processor on one node, in executing a process (a part of a computer program) must access memory on a remote node. This situation may arise when data such as program text (the actual machine instructions being executed) required by the process is stored in memory on the remote node. While accessing the instructions from the remote memory is expensive, it is still far faster than re-reading the required text in from the file system on secondary storage. Conversely, the remote memory access to the required instructions is far from ideal, as accessing remote memory is considerably slower than accessing memory locally on the node. Such remote memory references significantly reduce the speed of the process's execution.
A simple solution to this problem is to copy all of the required data (such as all of the executable code of a program) into the memory on each node in advance of execution. But this approach is impractical in most circumstances because the file may consume too much of the node's memory and much of the data, if program text, may never be executed on one or more of the system nodes.
An objective of the invention, therefore, is to provide a method and system for dynamically copying a file part or portion (the terms considered equivalent herein) such as program text stored in memory on a first node to memory on a second node for use by a process running on the second node. Another objective is to provide such a method and system that copies upon demand only the portions of the file needed by the process, thus avoiding the unnecessary displacement of other data present in memory on the second node.