1. Technical Field
The present invention is generally directed to an improved data processing system. More specifically, the present invention is directed to a system and method for improving the performance of dynamic memory removals in a data processing system by reducing the file cache size.
2. Description of Related Art
In systems that support dynamic logical partitioning (DLPAR), such as IBM's eserver pSeries computer systems, resources may be dynamically moved between partitions within the system. The moving of resources between partitions may be performed for various different reasons. For example, it may be desirable to consolidate a plurality of virtual machines, new logical partitions may be initiated requiring a rebalancing of resources, web servers associated with various ones of the logical partitions may have different peak usage times requiring moving of resources between the logical partitions at different times of day to accommodate the peak usages, and the like.
With DLPAR systems, resources are moved between partitions dynamically, i.e. performed non-disruptively while partitions continue to run. When DLPAR operations are in progress, the performance of the operating system may suffer slightly as resources are being examined and rebalanced. When a resource is added, it is immediately made available for use, in the same way as if the operating system had booted with the resource. More information about Dynamic Logical Partitioning in the eserver pseries computing devices may be found in the whitepaper entitled “Dynamic Logical Partitioning in IBM eserver pseries,” International Business Machines Corporation, Oct. 8, 2002, available from International Business Machines Corporation at wwwl.ibm.com/servers/eserver/pseries/hardware/whitepapers/dl par.html, which is hereby incorporated by reference.
Most DLPAR operations can be performed in a short amount of time and, in general, the performance benefit associated with resource addition and removal will scale proportionally to the change in resources. One main exception is memory removal. When an operating system dynamically removes real memory from a running system, the performance of the system is often negatively impacted, especially if the majority of physical memory is in-use.
The Advanced Interactive Executive (AIX) operating system uses virtual memory to address more memory than is physically available in the system. The management of memory pages in RAM or on disk is handled by the Virtual Memory Manager (VMM). Virtual-memory segments are partitioned in units called pages. A paging space is a type of logical volume with allocated disk space that stores information which is resident in virtual memory but is not currently being accessed. This logical volume has an attribute type equal to paging, and is usually simply referred to as paging space or swap space. When the amount of free RAM in the system is low, programs or data that have not been used recently are moved from memory to paging space to release memory for other activities.
On a system where physical memory is heavily utilized, dynamically removing a range of memory will result in a significant amount of paging activity as virtual memory pages are written out to disk to accommodate the system's smaller physical memory size. The virtual memory pages involved in these mass page-outs can be broken into two categories. The first category is file pages that are used to cache file data in physical memory. Because file pages are paged out to the same location on disk from which they came, paging space does not need to be allocated for file pages residing in RAM.
The second is working storage pages which are used for processes data heap, stack, shared memory, etc. Working storage pages are transitory and exist only during their use by a process. Working storage pages have no permanent disk storage location. Working storage pages must also occupy disk storage locations when they cannot be kept in real memory. The disk paging space is used for this purpose. Working storage pages in RAM that can be modified and paged out are assigned a corresponding slot in paging space. The allocated paging space is used only if the working storage page needs to be paged out. However, an allocated page in paging space cannot be used by another working storage page. It remains reserved for a particular working storage page for as long as that page exists in virtual memory.
When a process references a virtual-memory page that is on disk, because it either has been paged out (written to disk or paging space) or has never been read, the referenced page must be paged in, and this might cause one or more pages to be paged out if the number of available (free) page frames in RAM is low. The VMM attempts to steal page frames that have not been recently referenced and, therefore, are not likely to be referenced in the near future, using a page-replacement algorithm.
A successful page-replacement keeps the memory pages that are actively being referenced in RAM, while the memory pages not being actively referenced are paged out. However, when the RAM is over-committed, it becomes difficult to choose pages for page out. This is because the pages will probably be referenced in the near future by currently running processes. The result is that pages that are likely to be referenced soon might still get paged out and then paged in again when actually referenced. When RAM is over-committed, continuous paging in and paging out, called thrashing, can occur. When a system is thrashing, the system spends most of its time paging in and paging out instead of executing useful instructions, and none of the active processes make any significant progress. The VMM has a memory load control algorithm that detects when the system is thrashing and then attempts to correct the condition.
On systems where all of the physical memory, e.g., RAM, can be shared between file pages and working storage pages, the page replacement algorithm will not distinguish between replacing file and working storage pages. Thus, the mass amount of page-out input/output (I/O) operations that are initiated during a memory remove operation often involves a significant amount of working storage pages. This can have severe negative performance impacts on a system for several reasons. One of the principal reasons is that, because many of the pages being paged out may correspond to the working storage virtual memory used for different processes' working sets, and a large number of pages must be paged out to accommodate the memory removal operations, many of the pages that are paged out may need to be paged back in as the applications continue to run and re-reference the paged out memory pages. Thus, significant thrashing is generated as processes bring their working set of pages back into physical memory.
Having a large number of processes waiting for page-ins to complete can make the system almost unusable until the processes' working sets have been brought back into physical memory. Thus, not only is the system's performance bad during a memory remove operation, but a system's performance may remain poor for a significant duration even after a memory remove operation has completed.
Another reason for the significant negative affects of dynamic memory removal is that most high-end systems are configured with a small amount of paging space that is usually only spread among a very small number of disks. The large amount of paging activity that is generated during a dynamic memory removal operation is often bottle-necked by the small number of disks that are used for paging space. This results in long waits for I/O operations to complete.
Thus, it would be desirable to have an improved mechanism for performing dynamic memory removals that avoids the drawbacks set forth above.