The present invention relates to methods and systems for managing virtual memory remapping. More particularly, the invention is directed to efficient methods and systems for managing the extension of an applications""s address space by remapping virtual memory. The invention also relates to methods and systems for flushing memory caches in a multi-processor environment.
Virtual memory management techniques are well known in the art. Memory is primarily made up of fast local memory (e.g., Random Access Memory xe2x80x9cRAMxe2x80x9d) and slower external memory (e.g., magnetic or optical disks). Local memory may be further divided into very high speed memory, usually embodied in small cache memory, and somewhat slower main memory. The available size of external memory is limited only by the capacities of the disks present on the system, while the size of local memory is limited by the addressing capabilities of the processor.
Modern processor architectures typically provide virtual memory functionality. Virtual memory gives an application the illusion of having a very large, linear address space, significantly reducing the complexity of application memory management. At the same time, the operation system, which has the responsibility for managing virtual memory mappings, has the discretion to remove memory from the application""s address space when it determines that the memory may be better used elsewhere in the system. The combination of ease of use for application programming and flexibility in resource allocation in the operating system has proven to be powerful, leading to the popular adoption of virtual memory systems.
A virtual memory manager in an operating system determines what virtual memory is mapped to physical memory and what virtual memory will be unmapped from the faster physical memory and stored in slower external memory.
The operating system initializes the memory management data structures which will be used by the processor and operating system to translate virtual addresses into corresponding physical or external addresses. A virtual memory address is typically converted into a physical memory address by taking the high order bits of the virtual memory address and deriving the virtual page number. To determine the physical page which maps a virtual page, the virtual page number is indexed into a table (often called the pagetable), which specifies, among other things, the physical page number mapped by that virtual page mapping. The physical page number is then concatenated with the low order bits of the virtual memory addressxe2x80x94the byte offset within the pagexe2x80x94to produce the complete address in physical memory corresponding to the original virtual address. The processor performs this virtual-to-physical address translation process during program execution. The operating system manages the pagetable, modifying it as appropriate, to indicate to the processor which virtual pages map to which physical pages. The processor will then reference the physical memory addresses thus produced, to reference program code or data.
Typical 32-bit processors have 32-bit physical addressing capabilities. This implies that such a 32-bit processor may address up to 232 bytes of physical memory, or 4 GB. Similarly, 32-bit processors typically support 32-bit virtual addressing, yielding 4 GB virtual address spaces. Operating systems supporting virtual memory typically reserve between xc2xc and xc2xd of the total virtual address space provided by any given processor architecture, for storage of system-wide operating system data, such as device data structures and the file system cache. The remainder of the virtual address space may be used for application virtual memory. For a typical 32-bit processor with 32-bit virtual address, an application thus has access to roughly 2 to 3 GB of virtual address space for application use xe2x80x94including buffer space, application code, heap space, stack space, per-process control data, and the like.
Server applications such as database servers or mail servers typically require large virtual address spaces to support high throughput rates with large numbers of connected clients. These address spaces may contain caches of user data buffers, allowing the application to increase throughput by performing operations in main memory rather than performing slower external disk I/O, thereby increasing throughput. Once these applications have fully utilized the 2 to 3 GB of application virtual address space, further gains in throughput will ordinarily be impossible, since additional memory must be stored on disk rather than in main memory. However, depending on operating system activity, physical memory may be available in abundancexe2x80x94that is, there may be significant amounts of physical memory in the system which is not mapped into any address space, or is lightly used, and thus is available for use elsewhere.
But, since the application""s address space is fully consumed, there is no place in which to effectively use the memory to benefit the server application. The net effect is that application throughput is bottlenecked due to lack of accessible application memory space. A mechanism that provides a means for an application that has exhausted its virtual address space to allocate and access a large additional tier of main memory, even if such access is somewhat more expensive than standard application-mapped virtual memory, is therefore needed.
Computations associated with translating a virtual address into a physical address, internal to the processor""s execution engine, can be expensive, due to the multiple memory references involved and the associated logical manipulations. To reduce this overhead, processors usually have translation look-aside buffers (xe2x80x9cTLBsxe2x80x9d). TLBs are small caches made of fast, associative memory, internal to the processor. TLBs maintain a list of the most recently used virtual memory addresses along with their corresponding physical memory addresses. When the operating system changes the mapping of a virtual page by modifying the pagetable which stores the virtual-to-physical memory address translation data, the operating system must notify the processor(s) to flush the old virtual-to-physical memory mapping, if it exists, from the TLB.
If the physical memory address mapped to a virtual memory address is modified without the TLB being flushed, then an incorrect or invalid memory location may be accessed by an application, potentially resulting in data corruption or application failure. It is, therefore, critical that the operating system""s virtual memory manager flush stale TLB entries with certainty, to protect data integrity.
In a multi-processor (xe2x80x9cMPxe2x80x9d) computer system, not just the TLBs in the local processor must be updated when a virtual-to-physical mapping changes. In fact, all processors that are executing code in the subject address space (referencing in any way the soon-to-be modified pagetable) must have their local TLBs flushed. One standard technique of effecting this update operation is to send an interprocessor interrupt to each affected processor in the system. During the period of time the target processor(s) are handling the interrupt, other, lower priority processor activities are blocked, for a period of numerous processor cycles, while the target processor saves its state, further dispatches the interrupt, flushes its TLB, acknowledges that the TLB has been flushed, restores its pre-interrupt state, then continues with its previous work.
Meanwhile the processor that initiated the change to the virtual memory mapping performs a busy-wait operation (not doing any application work), waiting for the other processors that must flush their TLB, to acknowledge that the TLB flush operation is complete. The entire TLB flush operation, counting all the cycles on all processors involved and the bus cycles involved in communicating between processors, can be very expensive. A TLB flush operation is overhead, since the application makes no forward progress for the duration of the operation. An efficient method to update all processors"" TLB entries in a multi-processing environment, whether in the context of extending application memory space or not, is therefore highly desirable.
Furthermore, prior attempts have made additional physical memory available to an application beyond what the operating system would normally permit. For example, Washington, et al. U.S. Pat. No. 5,860,141 (xe2x80x9cWashingtonxe2x80x9d) assigned to the NCR Corporation creates an interface between an application and the operating system to permit the application to gain access to more address space than the application might otherwise be allotted by the operating system.
This is done by utilizing some of the physical memory which would normally not be within the addressable address space of the application to create a larger pool of memory buffers. Some of these buffers are directly accessible by the application and some must be remapped into application address space prior to use. These physical memory buffers are then managed by recycling the list of available virtual memory addresses in an exact least-recently-used order (xe2x80x9cLRUxe2x80x9d), when these virtual memory addresses are not in use by the application, and associating these virtual memory addresses with different physical memory buffers. However, using exact LRU to maintain the list of available virtual memory addresses to associate with new physical memory buffers creates performance bottlenecks in memory-intensive applications.
Washington maintains exact buffer LRU ordering by utilizing a linked list, which results in significant interprocessor coherence memory traffic as buffer headers must be linked to the head of the list whenever their reference count goes to zero, and unlinked from the list whenever they are needed by the application. Washington flushes processor TLB entries across all processors in an MP system at the time a buffer transitions from extended to mapped state, i.e. on demand.
The Washington TLB flush operation is carried out by invoking a driver routine, effecting a transition to kernel-mode, which then flushes the appropriate virtual addresses from the local processor""s TLB, and further requests interprocessor interrupts to invoke driver methods on other processors. Meanwhile, the issuing processor of the TLB flush operation idles itself until all driver routines indicate that the TLB flush operation is complete. This operation is expensive, involving a kernel-mode trap, multiple interrupts on multiple processors, numerous bus cycles (probably the heaviest expense since in a MP system running a server application at maximum throughput, the interprocessor or memory bus is typically the bottleneck), and significant end-to-end latency. The overhead of this operation significantly reduces application throughput.
It is apparent that a method and system that more efficiently provides a large additional tier of virtual memory to an application is needed to improve the operating performance of memory-intensive server applications. Furthermore, it is apparent that a method of efficiently flushing processor TLBs is needed to effect memory management operations with minimal overhead.
Accordingly, it is one object of the present invention to provide methods and systems for decreasing the overhead associated with providing an application very large memory (i.e., a memory beyond the normal operating system assigned application memory). Furthermore, it another object of the present invention to efficiently flush the TLBs associated with MP computing environments. This is achieved by reducing unnecessary interrupts of remote processors, thereby increasing application processing throughput.
An application, using conventional means, allocates a number (xe2x80x9cMxe2x80x9d) of ordinary, mapped memory buffers that are directly accessible to the application. The application also requests from a user library a number (xe2x80x9cNxe2x80x9d) of extended memory buffers, which are maintained in physical memory (e.g., RAM), but not directly accessible to the application. Thereafter, the application initializes all its data structures as though it had access to M+N memory buffers.
The state of the memory buffers (e.g., mapped, extended, or transitional) is abstracted and removed from the application""s programming logic. The application merely issues a call to the user library prior to using a memory reference and the user library ensures the buffer is in state permitting a reference by the application. After referencing the buffer, the application issues a call to the user library indicating that references to the buffer has ended, whereupon the user library frees the buffer for reuse. Multiple concurrent references may occur with the user library managing the reference counts of the buffer and the states of the buffer as appropriate.
Moreover, a flushing flag uniquely identifying each processor is maintained, such that the TLBs associated with each processor in an MP environment is optimally flushed when driver context is established on any processor. In this way, multiple interrupts and latency are avoided, and thus overhead is reduced.
Additional objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims. To achieve the foregoing and other objects and in accordance with the purpose of the present invention, methods and systems are provided for determining in advance of a request, from an application, those virtual memory addresses available for use, and for the efficient flushing of the TLBs (virtual-to-physical memory caches) associated with those available addresses on the processors in a multi-processor environment in advance.
One aspect of the present invention provides a method of granting additional memory located outside the addressable region of a process having executable instructions, comprising receiving a memory accessible through a plurality of out-of-process addressable addresses and a plurality of in-process addressable addresses. Further, a process is received which is capable of accessing the memory of a number of the out-of-process addressable addresses through a remapping of a number of the in-process addressable addresses. Additionally, a plurality of available in-process addressable addresses are created in advance of a request from the process.
Another aspect of the present invention provides a method for flushing a memory cache associated with virtual memory mapping in a multi-processing environment having executable instructions, comprising a memory accessible to a plurality of processors and a process. Moreover, the processors are permitted to run on a network and each processor has a virtual-to-physical memory cache including a plurality of virtual memory addresses mapped to a plurality of physical memory addresses. Further, the process runs on the processors and has a number of virtual memory addresses to access the physical memory addresses of the memory, and the process is interfaced with the virtual-to-physical memory cache for each of the processors. Also, when the driver has a context on one of the processors, the virtual-to-physical memory cache located on the processor in context is flushed, as appropriate.
In yet another aspect of the present invention, a system is provided for providing additional memory located outside the addressable region of a process and for flushing memory cache, comprising a memory accessed through a plurality of out-of-process addressable addresses and a plurality of in-process addressable addresses. Further, a plurality of processors are provided wherein each processor has a virtual-to-physical memory cache. Also, a process is provided having access to the memory through the in-process addressable addresses.
A remap set of executable instructions is operable to remap the in-process virtual addresses of the process to gain access to the memory associated with the out-of-process addresses. Finally, a flushing set of executable instructions flushes a number of the in-process addressable addresses and a number of the out-of-process addressable addresses on a number of the virtual-to-physical memory caches of the processors while the driver has a context on one of the processors.
Also, a system for providing additional memory located outside the addressable region of a process and for flushing memory cache is provided, comprising a memory accessible through a plurality of out-of-process addressable addresses and a plurality of in-process addressable address and a plurality of processors, each having virtual-to-physical memory cache. A remap set of executable instructions remaps the in-process addressable addresses of the process in order to gain access to the memory associated with the out-of-process addressable addresses. Further, a flushing set of executable instructions flushes the caches when the driver has a context one of the processors, as appropriate.
Still other objects of the present invention will become apparent to those skilled in this art from the following description wherein there are shown and described exemplary embodiments of this invention, simply for purposes of illustration. As will be realized, the invention may take on other aspects and arrangements than those described in detail below without departing from scope of the invention, as defined by the claims.