This invention relates to a method and system for optimizing a computer program image and, more particularly, to a method and system for rearranging code portions of the program image to reduce the working set.
Many conventional computer systems utilize virtual memory. Virtual memory provides a logical address space that is typically larger than the corresponding physical address space of the computer system. One of the primary benefits of using virtual memory is that it facilitates the execution of a program without the need for all of the program to be resident in main memory during execution. Rather, certain portions of the program may reside in secondary memory for part of the execution of the program. A common technique for implementing virtual memory is paging; a less popular technique is segmentation. Because most conventional computer systems utilize paging instead of segmentation, the following discussion refers to a paging system, but these techniques can be applied to segmentation systems or systems employing paging and segmentation as well.
When paging is used, the logical address space is divided into a number of fixed-size blocks, known as pages. The physical address space is divided into like-sized blocks, known as page frames. A paging mechanism maps the pages from the logical address space, for example, secondary memory, into the page frames of the physical address space, for example, main memory. When the computer system attempts to reference an address on a page that is not present in main memory, a page fault occurs. After a page fault occurs, the operating system copies the page into main memory from secondary memory and then restarts the instruction that caused the fault.
One paging model that is commonly used to evaluate the performance of paging is the working set model. At any instance in time, t, there exists a working set, w(k, t), consisting of all the pages used by the k most recent memory references. The operating system monitors the working set of each process and allocates each process enough page frames to contain the process"" working set. If the working set is larger than the number of allocated page frames, the system will be prone to thrashing. Thrashing refers to very high paging activity in which pages are regularly being swapped from secondary memory into the pages frames allocated to a process. This behavior has a very high time and computational overhead. It is therefore desirable to reduce the size of (i.e., the number of pages in) a program""s working set to lessen the likelihood of thrashing and significantly improve system performance.
A programmer typically writes source code without any concern for how the code will be divided into pages when it is executed. Similarly, a compiler program translates the source code into relocatable machine instructions and stores the instructions as object code in the order in which the compiler encounters the instructions in the source code. The object code therefore reflects the lack of concern for the placement order by the programmer. A linker program then merges related object code together to produce executable code. Again, the linker program has no knowledge or concern for the working set of the resultant executable code. The linker program merely orders the instructions within the executable code in the order in which the instructions are encountered in the object code. The computer program and linker program do not have the information required to make a placement of code within an executable module to reduce the working set. The information required can in general only be obtained by actually executing the executable module and observing its usage. Clearly this cannot be done before the executable module has been created. The executable module initially created by the compiler and linker thus is laid out without regard to any usage pattern.
As each portion of code is executed, the page in which it resides must be in physical memory. Other code portions residing on the same page will also be in memory, even if they may not be executed in temporal proximity. The result is a collection of pages in memory with some required code portions and some unrequired code portions. To the extent that unrequired code portions are loaded into memory, valuable memory space may be wasted, and the total number of pages loaded into memory may be much larger than necessary.
To make a determination as to which code portions are xe2x80x9crequiredxe2x80x9d and which code portions are xe2x80x9cunrequired,xe2x80x9d a developer needs execution information for each code portion, such as when the code portion is accessed during execution of the computer program. A common method for gathering such execution information includes adding instrumentation code to every basic block of a program image. A basic block is a portion of code such that if one instruction of the basic block is executed then every instruction is also executed. The execution of the computer program is divided into a series of time intervals (e.g., 500 milliseconds). Each time a basic block is executed during execution of the computer program, the instrumentation code causes a flag to be set for that basic block for the current time interval. Thus, after execution of the computer program, each basic block will have a temporal usage vector (xe2x80x9cusage vectorxe2x80x9d) associated with it. The usage vector for a basic block has, for each time interval, a bit that indicates whether that basic block was executed during that time interval. The usage vectors therefore reflect the temporal usage pattern of the basic blocks.
After the temporal usage patterns have been measured, a paging optimizer can rearrange the basic blocks to minimize the working set. In particular, basic blocks with similar temporal usage patterns can be stored on the same page. Thus, when a page is loaded into main memory, it contains basic blocks that are likely to be required.
The minimization of the working set is an NP-complete problem, that is, no polynomial-time algorithm is known for solving the problem. Thus, the time needed to minimize the working set of a program image generally increases exponentially as the number of code portions increase (i.e., O(en), where n is the number of code portions). Because complex program images can have thousands, and even hundreds of thousands, of code portions, such an algorithm cannot generate a minimum working set in a timely manner even when the most powerful computers are employed. Because the use of such algorithms are impractical for all but the smallest program images, various algorithms are needed to generate a layout that results in an improved working set (albeit not necessarily the minimal working set) in a timely manner.
The present invention provides a method and system for incrementally improving the layout of a program image of a computer program to reduce the working set. The system iteratively selects pairs of basic blocks and reorders the basic blocks in the range delimited by the selected pair of basic blocks. The system selects the pairs of basic blocks so that the working set of the computer program is improved by reordering the basic block in the range. Thus, during each iteration, the working set is improved. The system continues with these iterations until a termination condition (e.g., number of iterations) is satisfied.
In one embodiment, during each iteration the system designates one of the basic blocks as an initial anchor basic block. The system then repeats the following until the same range of basic blocks is identified twice in a row. The system first finds a basic block such that when the basic blocks in the range from the anchor basic block to the found basic block are reordered, the working set is more favorable than the reordering of any other range that ends with the anchor basic block. The system then designates the found basic block as the new anchor basic block. When the same range is found twice in a row, the system reorders the basic blocks in the range. This process is repeated for each iteration until the termination condition is satisfied. The resulting reordered program image has its working set improved.