The present invention concerns memory management. It particularly concerns ways of facilitating the use of different memory-allocation policies for different parts of a virtual-memory space.
Computer programs stored in a persistent memory device such as a computer disk include references, which specify the memory locations of data objects, functions, etc. The compiling and linking that result in such an executable program are typically performed independently of the compiling and linking that result in other executable programs. So nothing prevents different executables from using identical references to refer to different data objects. This would present no problem if the different programs were never to be present in physical memory simultaneously. But the reality is that a computer typically runs many different programs concurrently. Such different programs are typically referred to as "processes" in this context, and different processes using some identical references would interfere with each other in the absence of what is known as "virtual addressing."
FIG. 1 illustrates how one example of virtual addressing can be implemented. A typical computer system 10 includes, among other things, a central processing unit 12, a memory 14, and a bus 16 by which the central processing unit 12 and memory 14 communicate with each other. As is well known, a central processing unit's execution unit 18 operates by repeatedly fetching instructions from the memory 14 and executing those instructions. Often, such an instruction's execution includes reading data from or storing data to memory locations that a memory reference specifies.
But the memory reference in a given process is considered a virtual address, i.e., a value that is not itself the actual, physical address of the referred-to data object's location in memory 14, but rather a value from which that location's address can be determined. The translation from virtual address to physical address is the job of a memory-management unit 20, in cooperation with the operating system's memory-allocation software. Together, they treat a given process's virtual-address space as divided into "pages." When execution of a step in a process involves a reference, the memory management unit consults a translation buffer 22, which caches associations between various of the process's virtual-address pages and the physical-memory "page frames" in which those pages have been stored. A "translation-buffer hit" is said to occur if the translation buffer 22 contains the required association. A translation-buffer hit results in the translation's being completed. The location specified by the thereby-identified physical address is thereupon accessed.
If buffer 22 does not contain the required association, a "translation-buffer miss" is said to occur, and operation is interrupted so that the required association can be fetched from the process's page table stored in memory. FIG. 1 depicts the memory 14 as containing page tables 24, 26, and 28 that are specific to the three currently running processes that the drawing represents by physical-memory regions 30, 32, and 34. Of course, this representation diverges somewhat from reality, since the addresses of physical-memory segments actually allocated to a given process are not in general contiguous, and they may well be interspersed with the memory segments allocated to other processes.
Although a number of processes are concurrently active, the central processing unit can at a given instant be executing only one process's instructions. A typical approach to concurrent processing is therefore to divide each process's execution into time slices interspersed with other processes' time slices. When the central processing unit 12 turns to a new process after having completed a previous process's time slice, it performs a "context switch." A context switch is the processor's replacement of the contents of the program counter and various other registers 36 with the values that those registers contained at the end of the new process's last time slice. As part of the context switch, the contents of a page-table-base register 38 are replaced with a value from which, as will be explained below, a physical address in the new process's page table can be determined.
Now, the code for most processes includes calls to functions within the operating system. Because the different processes share the same operating-system code and data, the virtual addresses that refer to such code and data should map to physical addresses containing the same contents in all processes. So it is typical for a computer system's memory management to reserve a certain range of virtual addresses for shared code and data. A virtual address that falls within this range is always mapped to the same physical address, independently of the process in whose context it occurs.
In the description so far, a tacit assumption has been that the page tables always contain all of the required associations between a given process's virtual-address pages and the physical page frames that contain those pages. But this is true only insofar as there actually is a page frame within main memory 14 that contains the requested virtual-address page. If there is none, then the routine employed by the memory management unit to obtain page-table entries from a page table will terminate in favor of an operating-system routine whose purpose is to allocate memory to various processes. Specifically, the operating system is requested to allocate a page frame to the current process and, if necessary, place into that page frame virtual-address-page contents typically held in persistent storage.
From a performance viewpoint, the particular choice of page frame thus allocated is not very important in the typical system of the type that FIG. 1 depicts. This is because the main memory 14 is almost invariably "random-access" memory in the sense that the cost of a given access is largely independent of its particular memory location. But this position independence is not true of FIG. 2's multiprocessor system, in which various memory modules 44 and 46 impose different access costs. For one central processing unit 48, the cost of accessing a location in memory 44 is relatively low. Central processing unit 48 and memory 44 may share a common bus, for instance. There may additionally be further central processing units, not shown, for which the cost of access to memory 44 is similarly low. Memory 44 can be thought of as "local" to such processors.
For another central processing unit 50, though, access to memory module 44 is more costly: with respect to processor 50, module 44 is "remote." In the example, the reason for the extra cost is that the distance from central processing unit 50 to module 44 is relatively great and therefore requires port circuitry 52 and 54 to communicate at the lower speed to which the intervening channel 56 is limited. So it may cost central processing unit 50 considerably less to access memory module 46 than to access memory module 44. Similarly, central processing unit 48 may find accesses of memory module 44 less expensive than accesses of memory module 46.
Memory-allocation schemes may take these different costs into account. Allocation routines in the operating-system software executing on central processing unit 48 may preferentially allocate page frames in memory module 44 rather than in memory module 46, to which that central processing unit 48 also has access. But the division of labor among processors may render the resultant cost advantage illusory. To maximize processor utilization, a multiprocessor system may assign a given process's execution to different processors during different time slices. So memory preferentially allocated in memory module 44 by central processing unit 48 may subsequently need to be accessed by central processing unit 50 executing the same process in a different time slice. Also, different "threads" of a single process may run simultaneously on different processors. (Different threads of a given process employ the same memory space and programming but operate in the context of program counters and call stacks that do not in general contain the same values.) For example, processors 48 and 50 may simultaneously run threads that both reference memory module 44.
Of course, one may propose to restrict a given process's execution to the processor or processors to which a given memory module is most local. But such an approach could reduce processor utilization and compromise one of a multiprocessing system's strengths, which is that it can execute multiple threads of the same process simultaneously in some or all processors in a given system.
One way to retain multiprocessor-system advantages but reduce the cost that such non-uniform access times can exact is to replicate a process's code and data in the different locales in which the process may be executed. Now, replicating all such code and data is clearly impractical. Even if the memory-capacity impact could be ignored, such a general policy would in most systems impose an intolerable synchronization overhead for read/write data. Other overhead elements would compromise the intended performance gain, too.
Still, significant benefits can result if a replication policy is implemented selectively. The operating system's code and read-only data are attractive candidates for replication, for instance, since the resultant execution-speed advantage benefits essentially all processes. But it remains important to contain the resultant overhead so as not to compromise the intended execution-speed benefit excessively.
One element of overhead attends the page-table change that a change in the location of a process's execution can cause. It will rarely be acceptable to require that a process's code adjust its references in accordance with the particular locale in which it is being executed. So a locale change should be accompanied by a change in the correspondence between virtual and physical addresses. But changing a process's page tables as part of a context switch whenever a processor change occurs can have a significant adverse performance impact.