The present invention generally relates to a method of memory and CPU time allocation for a multi-user computer system and, more specifically, to a method of memory and CPU time allocation responsive to the system load of the multi-user computer system.
In 1977, Digital Equipment Corporation (DEC) introduced a new line of 32-bit minicomputers designated as the VAX.TM. (Virtual Address Extension) series. The VAX computer was released with a proprietary operating system called VMS.TM. (Virtual Memory System). VAX and VMS are registered trademarks of Digital Equipment Corporation.
The initial target market for the VAX series was the scientific and engineering communities. However, the market has now expanded to include the full range of data processing from large scale on-line transaction processing to single user workstations. Although the VAX architecture and operating system have functioned well throughout this expansion, some glaring inefficiencies of the operating system exist. Most of these inefficiencies are related to memory management.
The speed of the VAX CPU is determined by two factors: (1) the time it takes to process instructions in the CPU; and (2) the time it takes to access memory to fetch or store the data needed by the instructions. When the memory is accessed to fetch data needed by the instructions, the CPU waits on the access. These waits do not show as idle or "null" time--the CPU appears busy. These waits may be prolonged by other activity in the memory controller or on the CPU bus. In most VAX applications, the CPU speed is underutilized. The CPU spends a considerable portion of the time idle while it waits on the memory controller.
When a user logs on to the system at a terminal, the user becomes a "process" recognized by the system and is assigned an identification code. The user may then run programs and otherwise utilize the computer system. As used herein, process will generally refer to a logged-in user or an active terminal. Under VMS, each user is allocated a working set at log-in. A working set is a collection of pages (each page having 512 bytes) of physical memory where the user's work will be held. WSLIMIT denotes the maximum size of a process working set at any point in time. The actual size of the working set may be below WSLIMIT at a given time but, typically, is not. WSLIMIT is dynamic and may be changed as often as once per second for active processes. Usually, only a small fraction of the memory space which a process may legally access is in the working set at a given time.
When a process attempts to reference a page which is legally accessible, but is not currently in the working set, a page fault is generated and steps are taken to insert the page into the working set. If the actual size of the working set is at WSLIMIT, a page is removed from the working set and the needed page is located. If the page is located and it is determined that the page is not currently in physical memory, it is placed in physical memory. Thus, the required page of physical memory is made directly accessible to the process.
Page faults consume system resources. If avoided, faster system operation and program execution are achieved. Generally, the larger a working set, the fewer page faults that occur since more pages are directly accessible to the process. However, few processes, during any short time interval, access all or even a large fraction of the total pages legally accessible to the process. Therefore, even working sets that are a small fraction of the total size of a process can result in relatively small rates of faulting.
FIG. 1 illustrates VMS memory page movements. Data may be loaded into working set 10 from image file 15 or run-time libraries 20. Under heavy system load, read-only pages in working set 10 may be moved to Free Page List 25 while writable pages may be moved from working set 10 to Modified Page List 30. Free Page List 25 and Modified Page List 30 are emptied on a first-in first-out (FIFO) basis. As schematically illustrated, free pages are made available to working sets while writable pages are written to page file 40 or mapped section file 45. Under certain circumstances, the entire contents of a working set can be moved to a swap file, thereby releasing physical memory directly to Free Page List 25.
Moving pages back into working set 10 from Free Page List 25 or Modified Page List 30 utilizes few computer resources and is the preferred method of running an interactive system. These page movements are called soft faults. Moving pages from image file 15 or page file 40 back into working set 10 requires physical access to an I/O device such as a disk and therefore requires substantial time and computer resources (resource intensive). These page movements are called hard faults. Swap file retrievals are also hard faults but are easier on resource consumption since fewer I/O operations are generally required to retrieve similar amounts of data.
VMS includes a facility called Automatic Working Set Adjustment (AWSA), the operation of which is graphically illustrated in FIG. 2. Curve 70 represents the variation over time of the optimum or desired working set size for a given process. Line segments 75 represent the working set size limit WSLIMIT, discussed above, over time as generated by AWSA. Cross-hatched regions 76 represent times during which the working set size limit generated by AWSA is greater than the optimum working set size while solid regions 77 represent times during which the actual working set size limit is less than the optimum working set size. As illustrated, AWSA adjusts the working set size every two CPU QUANTUM ticks or after AWSTIME periods A1, A2, etc. The actual working set size is incremented by WSINC and decremented by WSDEC. In FIG. 2, WSINC is equal to 7 pages and WSDEC is equal to 3 pages.
Generally, it is very difficult to determine the precise size of the working set best suited for a given process since it varies depending on what operations a process is performing and varies as these operations move from phase to phase in their functioning. VMS monitors the faulting behavior of each process and adjusts WSLIMIT based on that behavior. However, AWSA does not react to the severity of current memory load (i.e. become more lenient as the load lightens) except for very large working sets. In addition, since AWSA is a statistical algorithm (i.e., it is based on statistics about what has happened), it is a reactive algorithm which assumes that what will happen is much like what has happened. This is not always the case.
AWSA is guided by the metric values assigned to several systems generation (SYSGEN) parameters. When a user logs on and, for example, activates a program, the user's working set is loaded from the image file with that program and the required data. If the program demands that more memory be made available by faulting in excess of the value of the SYSGEN parameter Page Fault Rate High (PFRATH), VMS will provide more pages of memory as determined by the SYSGEN parameter Working Set Increment (WSINC). When the memory requirements have been fulfilled, the program will eventually fault less than Page Fault Rate Low (PFRATL). VMS will remove memory pages as determined by Working Set Decrement (WSDEC). As FIG. 2 shows, the operation is a function of two time units: QUANTUM, the unit of CPU time allocated to each user and AWSTIME an integral number of QUANTUM ticks. AWSTIME determines how often the system is examined for possible memory adjustments.
Valid pages (i.e., these pages belonging to one or more processes) not currently in any working set are stored in one of five places: (1) EXE disk files; (2) system page files; (3) user disk files; (4) the Modified Page List; and (5) the Free Page List. The Modified Page List and the Free Page List are known collectively as secondary cache.
The Modified Page List is allowed to get no larger than the parameter MPW.sub.-- HILIMIT. When it reaches this size, SWAPPER is activated and, with reference to FIG. 1, writes pages from Modified Page List 30 to page file 40 or mapped section file 45 until the Modified Page List has been reduced to the size MPW.sub.-- LOLIMIT. After SWAPPER writes these pages from Modified Page List 30, they are logically inserted into Free Page List 25. Free Page List 25 has no maximum size. Free Page List 25 always represents all pages not currently utilized for another purpose. The terminology "free" is a misnomer--most pages in this list have valid, useful contents. They are "free" in the sense that, since they have a valid backing store, they can be used for another purpose without concern about saving their current contents. When physical memory is needed for any purpose, the page at the bottom of Free Page List 25 (which either currently has no valid contents or is the one placed in the list the longest time ago) will be selected for the new use.
When the user's program page faults, the needed page is located and steps are taken to place the page into the working set. These actions differ depending on where the needed page is located. If the page is in Secondary Cache, the system pointer table entries are modified to remove the page from the Modified Page List or Free Page List and include it in the working set. If the page is in another working set (shareable pages), the pointer table entries are updated to indicate that the page is in more than one working set. If the needed page has not previously been referenced, is zero, or has undefined initial contents, the page at the bottom of the Free Page List is zeroed and added to the working set. This is called a demand zero fault. The above three types of page faults are soft faults.
If the needed page is in a system page file or a mapped section file (writable pages which have been removed from Secondary Cache as described above), the page is read into the working set. If the needed page is in an EXE disk file, the page is also read into the working set. The above two faults are hard faults.
When a page I/O operation is performed, more than one page is read or written at a time if possible. This technique is known as clustering. Up to MPW.sub.-- WRTCLUSTER pages can be written in a single I/O operation if contiguous space is available in the page file. Pages are subordered by virtual address contiguity. Up to PFCDEFAULT pages may be read at one time, but no more than are virtually contiguous and physically contiguous on the disk. When reads are done from the page file, clusters of more than a few pages are seldom achieved.
The use of memory by VMS will grow as demanded--AWSA expands working sets as needed. System pointer tables or "memory maps" which permit a process to move through the memory expand as needed. New processes are allocated memory for working sets when needed. The Modified Page List expands to its upper limit.
However, physical memory is finite. If too much memory is used, SWAPPER is implemented to free memory. Anytime the number of pages in the Free Page List drops below the number determined by the system parameter FREELIM, SWAPPER will activate and initiate various actions to free memory. Specifically, it will expand the Free Page List to the size FREEGOAL. First SWAPPER will trim the Modified Page List. Next, SWAPPER will reduce any working set which has been expanded beyond WSQUOTA back to WSQUOTA. WSQUOTA is the smallest guaranteed working set size a process may have. SWAPPER will then swap or trim inactive processes. Finally, SWAPPER will swap or trim active processes. Swapping and trimming will be explained below. Most VMS systems have many inactive processes at any given time and large amounts of memory can be freed up by effective swapping. However, the swapping or trimming of active processes is never desirable.
Swapping is the action whereby the entire process working set is written, as a whole, to disk. It is accomplished by several large I/O operations directly from the working set to the swap file. A swapped process is not returned to memory until it is reactivated, even if memory becomes abundantly available. The overhead involved in swapping is relatively small, so long as the process remains in the swap file for several minutes or more. Pages freed by swapping are placed at the end of the Free Page List.
Trimming reduces the size of the working set to some predetermined size (SWPOUTPGCNT). Trimming inactive processes generates severe performance degradation for the system. The trimmed pages go to the top of the Secondary Cache, which tends to push pages from the active working sets out of the cache. Therefore, hard faulting among active processes is significantly increased. As the trimmed process continues to be inactive, most of its pages cycle out of the Secondary Cache. The extra volume of pages in the system page file tends to increase its fragmentation, potentially decreasing SWAPPER's ability to create large page clusters when writing pages. When the trimmed process is reactivated, since up to 90% or more of its working set will have been removed, it will fault heavily. Many of these faults will be hard faults. Because clustering from the system page file will be poor, many more I/O operations will be required to restore the process working set than if the process had been transferred to a swapfile. Although an inswap from the swap file is a hard fault, it is a less resource intensive operation since it generally requires few I/O operations.
If the number of pages in the Free Page List exceeds a predetermined value (BORROWLIM) (suggesting very low demand for memory), AWSA expands working sets to WSEXTENT, the largest working set size allowed per process. In DEC terminology, this is known as "borrowing". Two safety mechanisms exist to prevent borrowing from overusing memory. First, the actual expansion of the working set, as opposed to the simple changing of WSLIMIT, to any amount larger than WSQUOTA is inhibited if there are less than a predetermined number (GROWLIM) of pages in the Free Page List. Second, SWAPPER reduces working sets larger than WSQUOTA back to WSQUOTA any time the Free Page List size drops below FREELIM.
Normally, VMS handles page movements very well. The drawback of VMS stems from the fact that the system manager can only tune the metric value of these SYSGEN parameters to meet a static system load. Problems arise in very busy interactive systems when the demand for memory overwhelms the VMS AWSA function. The symptoms of this problem include slow interactive response time, slow log-ins, slow image activations and often, an unacceptable cap on the number of interactive users. When these conditions occur, the system manager has few options. He can retune the system to meet these demand periods or he can inhibit certain user's access to the system. In most government and commercial enterprises, neither of these alternatives are viable.
FIG. 3 is a schematic diagram which illustrates these effects. Region 12 represents the narrow range of system load for which the system manager has statically tuned the system for efficient operation. Load as used herein refers to the collective of interactive and batch users, anticipated CPU and I/O usage, and memory demand and utilization of external devices such as printers and plotters. However, as curve 14 suggests, system load conditions do not always fall within the range of system loads for which the system has been statically tuned. At point A, for example, the system load is heavier than the statically tuned load and the system suffers from slow interactive response times, slow log-ins etc. At point B, the system load is within the range of tuned loads and the system operates efficiently. However, at point C the system load is less than the statically tuned load and the system capabilities are underutilized. The present invention is a method of load sensitive reactive tuning which seeks to configure region 12 to the varying load represented by curve 14 by periodic readjustment of certain tuning parameters.