In a hypervisor environment, where physical memory access is controlled by the hypervisor instead of an operating system running on top of the hypervisor, the performance of memory access algorithms contribute significantly to the overall performance of the system.
In a shadow page table environment, the page tables that the operating system operates on are not the real page tables that the machine uses. Instead, access to the page directory root (e.g. the CR3 register on an IA32 system or an AMD64 system that points to a page table) is kept private to the hypervisor, and the operating system's page directory root is virtualized. The hypervisor virtualizes load and store operations to the page directory root, so that the operating system appears to be running atop real hardware. The hypervisor-private page table is called the shadow page table. Conversely, the operating system page table is called the guest page table. When the operating system modifies its guest page table entries, the shadow page table entries must also be modified to correspond to the operating system's modifications.
In order to improve virtual-to-physical translations, translation look aside buffers (TLBs), which are stored on CPUs, are used as caches. Thus, instead of looking up translations in a page table, an operating system can employ the much faster TLB translations. However, such TLBs are very limited in storage, typically containing 128 to 256 entries, so only the most recent and relevant translations are kept in them.
On several popular processor architectures, for example, the Intel IA 32 or the x86 or the AMD x64 architecture, the entire TLB cache is discarded when an address space is changed, i.e., when an assignment is made to the page directory root. The reason for this is that the new address space (the switched to address space) gets to use the TLB since it is active and the old address space (the switched from address space) does not since it is not active anymore. Upon such address space switch, shadow page tables are also typically discarded. When a shadow page table is discarded, repopulating it with new translation entries is very costly in terms of processor cycles. Thus, it is advantageous to reduce the high cost associated with populating a shadow page table. Or, put another way, it would be advantageous to reduce the high cost associated with discarding an entire shadow page table when an address space change occurs.
Next, to perform efficient paging algorithms, current processors frequently implement mechanisms to determine if a page has been accessed (i.e. whether it has been read) or modified (i.e. whether it has been written to). In most implementations, two flags are maintained in a page table entry: a flag that is set when a page is accessed and a separate flag that is set when a page is modified (the modified flag is often called the dirty flag).
In a shadow page table implementation, these accessed and modified flags will be set in the shadow page table, which is invisible to the operating system. For the proper functioning of many operating systems, these accessed and modified flags must be correctly maintained. In most processor architectures, it is impossible to transparently maintain consistency between the accessed and modified flags in the shadow page table and the accessed and modified flags in the guest page table.
To correctly maintain the accessed flags, shadow page table algorithms must examine the guest page table's accessed flag. If a guest entry's accessed flag is cleared, the corresponding entry within the shadow page table must be marked as invalid. When the guest accesses this page, the hypervisor receives control and marks the page as valid in the shadow page table and accessed in the operating system's guest page table.
Similarly, to correctly maintain the modified flags, a shadow page table implementation must mark a page as read-only, then process the page fault interrupt when an attempt is made to write to the page. Within the interrupt, the shadow page must be marked as writable and the guest page table entry must be marked as modified. Processing these interrupts to maintain the active and modified flags of page table entries is a significant source of slowdown for a shadow page table implementation. Thus, it would be advantageous to reduce the high cost of maintaining accessed and modified flags in the operating system's guest page table entries.
Finally, on a multiprocessor system, when a page table entry is modified, the page table entry must be purged not only from the TLB of the processor that modified the entry, but from the TLB of any processor that may have a cached copy of the table entry. In some processor architectures, this cross-processor TLB invalidation is performed explicitly by software using an inter-process interrupt. This cross-processor TLB invalidation is often referred to as a TLB shoot down. TLB shoot down algorithms are very expensive in terms of processor cycles—especially in a virtualized environment. In particular, the current TLB shoot down algorithms require many transitions into the hypervisor to accomplish their task, and require more inter-processor interrupts than may otherwise be required. Thus, it would be advantageous to reduce the high cost of TLB shoot down in a hypervisor (or an equivalent virtualizing program).