1. Field of the Invention
This invention relates to the field of memory management in computer systems.
2. Description of the Related Art
Most modern computers include at least one form of data storage that has programmable address translation or mapping. In most computers, this storage will be provided by a relatively high-speed system memory, which is usually implemented using solid-state random-access memory (RAM) components.
Although system memory is usually fast, it does have its weaknesses. First, it is usually volatile. Second, for a given amount of data to be stored, system memory takes up more physical space within the computer, is more expensive, and requires more support in terms of cooling, component sockets, etc., than does a conventional non-volatile storage device such as a disk. Thus, whereas many gigabytes of disk storage are commonly included in even computers in the relatively unsophisticated consumer market, such computers seldom come with more than 128 or perhaps 256 megabytes of system RAM.
Because higher speed access to stored data and code usually translates into faster performance, it is generally preferable to run as much of an active application from system memory as possible. Indeed, many applications requiring real-time processing of complex calculations such as voice-recognition software, interactive graphics, etc., will not run properly at all unless a certain amount of RAM is reserved for their use while running.
High-speed system memory is a limited resource and, as with most limited resources, there is often competition for it. This has become an even greater problem in modern multi-tasked systems, in which several applications may be running or, at least resident in memory, at the same time. Even where there is enough memory in a given system for all the applications that need it, it is still often advantageous to conserve memory use: RAM costs money, and consumes both energy and physical space. More efficient management of RAM can reduce the cost, energy, or physical space required to support a given workload. Alternatively, more efficient management of RAM can allow a system to support a larger number of applications with good performance, given a fixed monetary, energy, or physical space budget.
Applications may be defined broadly as any body of code that is loaded and executes substantially as a unit. Applications include, among countless other examples, common consumer programs such as word processors, spreadsheets and games; communications software such as Internet browsers and e-mail programs; software that functions as an aide or interface with the OS itself, such as drivers; server-oriented software and systems such as a web server, a transactional database, and scientific simulations; and even entire software implementations of whole computers, commonly known as xe2x80x9cvirtual machinesxe2x80x9d (VMs).
One technique for reducing the amount of system memory required for a given workload, and thereby for effectively xe2x80x9cexpandingxe2x80x9d the amount of available system M memory, is to implement a scheme whereby different applications share the memory space. Transparent page sharing, in the context of a multi-processor system on which virtual machines are running, is described in U.S. Pat. No. 6,075,938, Bugnion, et al., xe2x80x9cVirtual Machine Monitors for Scalable Multiprocessors,xe2x80x9d issued 13 Jun. 2000 (xe2x80x9cBugnion ""938xe2x80x9d). The basic idea of this system is to save memory by eliminating redundant copies of memory pages, such as those that contain program code or file system buffer cache data. This is especially important for reducing memory overheads associated with running multiple copies of operating systems (e.g., multiple guest operating systems running as virtual machinesxe2x80x94see below).
There are two main components to the technique disclosed in Bugnion ""938. First, candidate pages that could potentially be shared are identified. Second, the pages are actually shared, when possible, so that redundant copies can be reclaimed.
The approach in Bugnion ""938 for identifying pages is to add hooks to the system to observe copies when they are created. For example, a routine within the operating system running within the virtual machinexe2x80x94the virtual operating system VOSxe2x80x94that is used to explicitly copy memory regions is modified to allow copied pages to be shared. Note that the VOS may also be considered to be a xe2x80x9cguestxe2x80x9d operating system, since the virtual machine, although it is configured as a complete computer system, is actually a software construct that is running on an underlying, physical xe2x80x9chostxe2x80x9d system.
Another example is Bugnion ""938""s interposition on disk accesses, which allows disk transfers from a shared non-persistent disk to be shared across multiple guests (virtual machines). In this case, Bugnion ""938 tracks disk blocks that are already in main memory, so subsequent requests for the same blocks can be shared. Similarly, support for special devices is added to guests, such as a special virtual subnet that supports large network packets, allowing guests to communicate with each other while avoiding replicated data when possible.
The Bugnion ""938 approach for sharing a page is to employ an existing MMU ad (memory management unit) hardware device to map the shared page read-only for each guest that is sharing it, and to make private copies of the page on demand if a guest attempts to write to it. This technique is known as xe2x80x9ccopy-on-writexe2x80x9d (COW), and is well-known in the literature. In the context of virtual machines, page-sharing can be made transparent to guest, that is, virtual, operating systems, so that they are unaware of the sharing. This is done by exploiting the extra level of indirection in the virtualized memory system between the virtualized guest xe2x80x9cphysicalxe2x80x9d memory (which the VM xe2x80x9cbelievesxe2x80x9d is the actual hardware memory, but which is actually a software construct) and the actual underlying hardware xe2x80x9cmachinexe2x80x9d memory. In short, multiple guest physical pages can be mapped copy-on-write to the same machine page.
One disadvantage of the page-sharing approach described in Bugnion ""938 is that the guest OS must be modified to include the necessary hooks. This limits the use of the Bugnion ""938 solution not only to systems where such modifications are possible but also to those users who are willing and knowledgeable enough to perform or at least accept the modifications. Note that such attempted modifications to commodity operating systems may not be possible for those other than the manufacturer of the operating system itself, and then not without greatly increasing the probability that the modifications will lead to xe2x80x9cbugsxe2x80x9d or instability elsewhere.
Another disadvantage of the Bugnion ""938 system is that it will often fail to identify pages that can be shared by different VMs. For example, assume that each VM is using its own persistent virtual disk, that each VM is running a different operating system as the guest OS, for example Windows NT4 and Windows 2000, respectively, and that each is running completely different installations of the software package Microsoft Office 2000. The executable code (for Office 2000) will then be identical for the two VMs, yet the Bugnion ""938 system will not identify this. Two complete copies of the same program may then be resident in the system memory at the same time, needlessly taking up many megabytes of memory in order to store the redundant second copy of the program code.
What is needed is a memory management system (and corresponding method of operation) that can be implemented without having to add hooks to the existing guest operating system, and that is able to identify opportunities for page .sharing that are not found and exploited by existing memory management techniques. The memory management system should, however, remain transparent to the applications that are using it. This invention provides such a memory management system and related method of operation.
The invention provides a method and a related system configuration for sharing memory units, such as pages, in a computer system that includes a hardware memory and at least one context. Each context has a virtual memory that is divided into a plurality of virtual memory units that are mappable to corresponding hardware memory units. The memory may be persistent or non-persistent. According to the invention, the system identifies virtual memory units that have identical contents and then maps those virtual memory units identified as having identical contents to a single instance of a corresponding one of the hardware memory units.
In the preferred embodiment of the invention, candidate memory units are selected from among the virtual memory units and their contents are hashed.
Identification of virtual memory units that have identical contents is preferably carried out by calculating a hash value by applying a hash function to the contents of a current one of the candidate memory units. A data structure such as a hash table is then searched to determine the presence of a previous data structure entry corresponding to the calculated hash value. If a previous entry is not present in the data structure, then a new entry is inserted into the data structure corresponding to the current candidate memory unit. If a previous entry is present in the data structure, then the entire contents of the current candidate memory unit are compared with the contents of the single instance indicated by the previous entry.
According to one aspect of the preferred embodiment of the invention, all or only selected ones of the virtual memory units that are mapped to the single instance are write-protected, such as by using a copy-on-write (COW) mechanism. A request by any context to write to any write-protected virtual memory unit is then sensed. Upon sensing such a request, a private copy of the write-protected virtual memory unit is generated in the hardware memory for the requesting context and the write-protected virtual memory unit is remapped to the private copy.
In order to improve the efficiency of the invention even further, any or all of several optimizations may be implemented. One such optimization involves identifying virtual memory units that have a relatively high probability of impending modification and then designating these as temporarily non-sharable virtual memory units. For these temporarily non-sharable virtual memory units, mapping to the single shared instance is then preferably deferred, for example, until a different one of the virtual memory units is subsequently identified as having contents identical to the respective temporarily non-sharable virtual memory unit. As yet another optimization, write-protection is deferred for any candidate virtual memory unit for which no other virtual memory unit has yet been identified as having identical contents.
The invention provides different ways to select candidate virtual memory units for content-based comparison with other virtual memory units and for possible sharing. For example, selection may be random, or according to any of several heuristic criteria. Candidate virtual memory units are preferably selected and examined for possible sharing during a system idle time.
The preferred embodiment of the invention is virtualized, in which the computer Ad system includes at least one virtual machine, which forms a context and has at least one address space that is divided into a plurality of virtual memory units. The virtual machine also includes a virtual operating system that maps each virtual memory unit to a corresponding intermediate memory unit. In this embodiment, for each virtual machine, the system also includes a software layerxe2x80x94a virtual machine monitorxe2x80x94as an interface between the virtual machine and the underlying system software and hardware. Among other things, the virtual machine monitor implements an intermediate mapping of each intermediate memory unit to a corresponding hardware memory unit. In this virtualized embodiment, the intermediate memory units, instead of the virtual memory units, are chosen and mapped to the shared instances of hardware memory. Other procedural steps such as content-based comparison, write-protection, etc., are then also carried out based on the intermediate memory units. The intermediate mapping provides an extra level of indirection that is advantageously exploited by the virtualized embodiment of the invention.
In one alternative embodiment of the invention, at least one context is a virtual disk.
In another xe2x80x9cbrute forcexe2x80x9d embodiment of the invention, hashing is not used at all. Rather, in order to discover virtual memory units that are identical to others, the system simply compares the contents of each of the virtual memory units with the contents of each of the other virtual memory units.
According to yet another aspect of the invention, the virtual memory units are partitioned into a plurality of classes. The steps of identifying virtual memory units that have identical contents and of mapping those virtual memory units to a single shared instance of a corresponding hardware memory unit are in this case carried out separately and independently for each of the classes. Sharing of single instances of the hardware memory units thereby takes place only among virtual memory units in the same class. One example of possible classes are page colors of the hardware memory units to which the corresponding virtual memory units are currently mapped. Another example is the case in which the computer system has a multiprocessor architecture with a non-uniform memory access (NUMA) property and a plurality of memory modules having different access latency. In this case, the classes are the memory modules to which the corresponding virtual memory units are currently mapped.