This invention relates to a method and system for optimizing a computer program image and, more particularly, to a method and system for rearranging code portions of the program image to reduce the working set.
Many conventional computer systems utilize virtual memory. Virtual memory provides a logical address space that is typically larger than the corresponding physical address space of the computer system. One of the primary benefits of using virtual memory is that it facilitates the execution of a program without the need for all of the program to be resident in main memory during execution. Rather, certain portions of the program may reside in secondary memory for part of the execution of the program. A common technique for implementing virtual memory is paging; a less popular technique is segmentation. Because most conventional computer systems utilize paging instead of segmentation, the following discussion refers to a paging system, but these techniques can be applied to segmentation systems or systems employing paging and segmentation as well.
When paging is used, the logical address space is divided into a number of fixed-size blocks, known as pages. The physical address space is divided into like-sized blocks, known as page frames. A paging mechanism maps the pages from the logical address space, for example, secondary memory, into the page frames of the physical address space, for example, main memory. When the computer system attempts to reference an address on a page that is not present in main memory, a page fault occurs. After a page fault occurs, the operating system copies the page into main memory from secondary memory and then restarts the instruction that caused the fault.
One paging model that is commonly used to evaluate the performance of paging is the working set model. At any instance in time, t, there exists a working set, w(k, t), consisting of all the pages used by the k most recent memory references. The operating system monitors the working set of each process and allocates each process enough page frames to contain the process"" working set. If the working set is larger than the number of allocated page frames, the system will be prone to thrashing. Thrashing refers to very high paging activity in which pages are regularly being swapped from secondary memory into the pages frames allocated to a process. This behavior has a very high time and computational overhead. It is therefore desirable to reduce the size of (i.e., the number of pages in) a program""s working set to lessen the likelihood of thrashing and significantly improve system performance.
A programmer typically writes source code without any concern for how the code will be divided into pages when it is executed. Similarly, a compiler program translates the source code into relocatable machine instructions and stores the instructions as object code in the order in which the compiler encounters the instructions in the source code. The object code therefore reflects the lack of concern for the placement order by the programmer. A linker program then merges related object code together to produce executable code. Again, the linker program has no knowledge or concern for the working set of the resultant executable code. The linker program merely orders the instructions within the executable code in the order in which the instructions are encountered in the object code. The computer program and linker program do not have the information required to make a placement of code within an executable module to reduce the working set. The information required can in general only be obtained by actually executing the executable module and observing its usage. Clearly this cannot be done before the executable module has been created. The executable module initially created by the compiler and linker thus is laid out without regard to any usage pattern.
As each portion of code is executed, the page in which it resides must be in physical memory. Other code portions residing on the same page will also be in memory, even if they may not be executed in temporal proximity. The result is a collection of pages in memory with some required code portions and some unrequired code portions. To the extent that unrequired code portions are loaded into memory, valuable memory space may be wasted, and the total number of pages loaded into memory may be much larger than necessary.
To make a determination as to which code portions are xe2x80x9crequiredxe2x80x9d and which code portions are xe2x80x9cunrequired,xe2x80x9d a developer needs execution information for each code portion, such as when the code portion is accessed during execution of the computer program. A common method for gathering such execution information includes adding instrumentation code to every basic block of a program image. A basic block is a portion of code such that if one instruction of the basic block is executed then every instruction is also executed. The execution of the computer program is divided into a series of time intervals (e.g., 500 milliseconds). Each time a basic block is executed during execution of the computer program, the instrumentation code causes a flag to be set for that basic block for the current time interval. Thus, after execution of the computer program, each basic block will have a temporal usage vector (xe2x80x9cusage vectorxe2x80x9d) associated with it. The usage vector for a basic block has, for each time interval, a bit that indicates whether that basic block was executed during that time interval. The usage vectors therefore reflect the temporal usage pattern of the basic blocks.
After the temporal usage patterns have been measured, a paging optimizer can rearrange the basic blocks to minimize the working set. In particular, basic blocks with similar temporal usage patterns can be stored on the same page. Thus, when a page is loaded into main memory, it contains basic blocks that are likely to be required.
The minimization of the working set is an NP-complete problem, that is, no polynomial-time algorithm is known for solving the problem. Thus, the time needed to minimize the working set of a program image generally increases exponentially as the number of code portions increase (i.e., O(en), where n is the number of code portions). Because complex program images can have thousands, and even hundreds of thousands, of code portions, such an algorithm cannot generate a minimum working set in a timely manner even when the most powerful computers are employed. Because the use of such algorithms are impractical for all but the smallest program images, various algorithms are needed to generate a layout that results in an improved working set (albeit not necessarily the minimal working set) in a timely manner.
The present invention provides a method and system for improving the working set of a program image. The working set (WS) improvement system of the present invention employs a two-phase technique for improving the working set. In the first phase, the WS improvement system inputs the program image and outputs a program image with the locality of its references improved. In the second phase, the WS improvement system inputs the program image with its locality of references improved and outputs a program image with the placement of its basic blocks in relation to page boundaries improved so that the working set is reduced.
The present invention provides a technique for evaluating the locality of references for a layout of a computer program. The technique calculates a metric value indicating a working set size of the layout when the layout is positioned to start at various different memory locations within a page. This technique then combines the calculated metric values as an indication of the locality of references of the layout of the computer program. By combining the calculated metric values, the effect of page boundaries on the working set size is averaged and the combined metric value represents the effects of the locality of references or the working set size.
The present invention provides a technique for estimating the rate of improvement in the working set for a plurality of incrementally improved layouts of a computer program. The technique estimates the change in working set size from one incrementally improved layout to the next incrementally improved layout and estimates the time needed to incrementally improve the layout. The technique then combines the estimated change in working set size with the estimated time needed to incrementally improve the working set for that layout to estimate the rate of improvement. By separately estimating the change in working set size and the time needed to incrementally improve the working set, different estimation techniques that are appropriate to the data being estimated can be used.
The present invention provides a technique for identifying coefficients for a filter for filtering results of a function. The technique collects sample input values to the filter and identifies desired output values from the filter for the collected sample input values. The technique then generates a power spectrum of the collected sample input values and a power spectrum of the identified desired output values. The technique then calculates the difference between the generated power spectra. Finally, the technique identifies coefficients that yield a filter transfer function that closely approximates the calculated differences. The present invention also provides a technique for identifying coefficients for a finite impulse response filter. The technique collects sample input values for a function and identifies desired output values for the filter for the collected sample input values. The technique then approximates the output values from the input values using a linear fitting technique. Finally, the technique sets the coefficients to values obtained from the linear-fitting technique. When the input and output values represent the rate of change in working set size resulting from sample runs of the WS improvement system, then the filter can be used to estimate the rate of change dynamically as the improvement process proceeds.