In computer systems, it is desired to improve the performance of large production applications such as database servers. These applications often have a huge number of features and include large numbers of instructions in single executable images of the applications. For example, an executable image for the Oracle 7.3 database application includes about nine MegaBytes of instructions.
In addition, many of these large applications are known to have a large instruction "footprint"--that is, a large number of instructions are used repeatedly during the execution of the application. The large instruction footprint causes these applications to incur a large number of instruction cache capacity misses, particularly in the relatively small first-level instruction cache typically co-located on the same semiconductor chip with the processor. Even worse, a calling procedure may overlap with a called procedure in a cache. In that case, there is direct conflict.
Because of the cache misses, a large percentage of an application's execution time can be consumed by processor stalls while missing instructions are fetched from more distant levels of memory. For example, when the Oracle application is executing the TPC benchmarks on Digital Equipment Corporation's AlphaServer 4100 multi-processor, instruction stalls can account for up to 30% of the execution time.
In the prior art, instruction stalls due to cache misses have been reduced by controlling where the instructions are placed in memory. By careful placement of instruction, caches can take advantage of spatial localities. Most systems have made a static determination for the layout of instructions at compile time or link time.
For instance, linkers have been used to construct a call graph of the application. The call graph indicates how the procedures are called. Procedures that call each other can be placed together in the memory. If procedures are placed close to each other, cache conflicts are minimized, even in a direct-mapped instruction cache. Furthermore, if the size of the instruction cache is known, then procedures can be placed in a non-conflicting manner, even when they are not immediately adjacent to each other, which may not always be possible.
However, statically determining an optimal placement for procedures is difficult. In many cases, the actual execution flow may depend on the run-time environment, such as the workload (data) that is processed. Usually, the workload cannot be determined statically. In addition, cache conflicts need only be avoided for procedures that call each other often. A procedure that is called only occasionally does not require optimal placement.
Many systems have used profile information to guide the layout of instructions at link time. Profile information is statistical data that are collected during execution of the application. The statistics can identify the relative frequency at which procedures are called.
However, profile-based placement has two problems. First, users have to profile the application, and the application has to be re-compiled or re-linked to take advantage of the optimal placement. Second, even with profiling, there is no guarantee that the statistics will reflect different uses of the application.
The second problem is especially interesting in the case of an application such as the Oracle database server. The Oracle program can be used for on-line transaction processing (OLTP), as well for decision support systems (DSS). These different uses might possibly have totally different execution profiles, consequently, an instruction layout for one use may be totally unsuitable for the other. Optimally satisfying both needs would require different versions of the application.
In one prior art simulated approach, instructions are dynamically placed in the memory by a dynamic loader. This approach is described by Chen et al. in "Improving Instruction Locality with Just-In-Time Code Layout", The 1997 USENIX Windows NT Workshop, August 1997. There, a dynamic layout approach for reducing instruction cache conflicts is evaluated. However, the approach is only a simulation without any actual implementations in a real system.
The simulation avoids a number of several hard problems that must be solved for practical systems. A particular problem that will be encountered during practical dynamic instruction placement is the use of indirect procedure calls, for example, the use of function pointers in the C programming language. Because the initialization and utility routines for C programs typically make use of function pointers in a modem operating system, the prior art approach would not work in current systems, even if the actual application does not produce or use procedure pointers.
There is also a problem with intercepting calls to procedures that have not yet been placed in memory. According to Chen, the entry points of the procedures would be replaced by small "thunks" of instructions. These instructions would effectively "jump" to the dynamic loader when the procedures are called. Patching the "thunks" requires an extra pass over the entire program at program startup to install the thunks, or the stored copy of the executable image has to be changed.
There would be a considerable run-time overhead should this simulation ever be implemented in a real system, and the behavior of this method in a complex large-scale database application is difficult to predict.
Therefore, there is a need for an efficient and practical method that can dynamically place instructions of an application program in a memory of a computer system while the program is executing so as to reduce conflicts of the instructions in caches.