Extensive research has been directed to improving available software development and performance monitoring techniques for generating efficient organization of program code. In writing code for any programming language, a programmer must be aware of software performance. One type of programming construct that may affect performance at execution time is the familiar conditional constructs such as an IF-THEN-ELSE statement of PL/I, an IF-THEN-ELSIF-ELSE statement of FORTRAN-78, a SWITCH-CASE statement of C, etc. The high impact of such constructs arises in numerous computers as described below.
In many of the data processing units available today, instructions are executed in a pipelined manner, with consecutive instructions in the instruction stream being in different stages of execution at any given time. As a result, the processor must determine, at each instant, which instruction will be executed next, prefetch that instruction and input it to the pipeline. The pipelined execution of instructions proceeds in an orderly manner until a branch instruction is encountered. If it is an unconditional branch instruction, the processor will be able to compute the target address, initiate prefetching of instructions from the target address and continue to fill the pipeline.
A problem occurs, however, for a conditional branch instruction which branches to a target address only if a specified condition is satisfied. In this case, the processor unit may not know at the time of fetching the branch instruction whether the branch will actually take place in the future, i.e., once the branch instruction is finally executed, and hence may not be able to prefetch the correct instructions each time.
In the more sophisticated and expensive systems, such as the IBM ESA/390 mainframe computer, the processor unit includes separate hardware to predict which way a branch should go for a conditional branch instruction. Because this function requires a lot of hardware to implement, however, it is generally not provided in today's workstations, such as the IBM RISC System/6000 and personal computers. In such systems, when a conditional branch instruction is encountered, the processor unit simply makes an assumption that the branch will not be taken and continues to prefetch the instructions following the conditional branch instruction. If the assumption made by the processor is correct, then pipeline execution proceeds smoothly.
If, however, the processor assumes wrong, then previously prefetched instructions starting from the one following the conditional branch instruction will have to be discarded, with the processor state being restored if necessary, and the instructions from the branch target address will have to be fetched. This process breaks pipeline execution and results in performance degradation lasting several processor cycles. The problem is further exacerbated when the target address happens to lie in a different cache line and a cache miss occurs while fetching the instructions at the target address, leading to a time-consuming main store operation.
One common practice to reducing cache misses under such a condition is for the programmer to insert calls to sub-routines to handle conditional cases less likely to occur, and leave program code in the mainline for the conditional cases most often to occur. This solution requires that the programmer be aware at the time of writing the code of the likelihood of satisfying each condition. Unfortunately, this information is infrequently available in practice. Rather, what happens instead is that traditional performance-test runs point out which conditional cases occur most commonly and this information is then used by the programmer to reoranganize the high-level language source code in an attempt to enhance performance of the assembled code.
In a more general aspect, most compilers/optimizers employed today are incapable of optimizing program code based on an unknown execution environment. The ultimate execution environment for a program consists of hardware (such as cache organization, storage geometry, configuration characteristics including-vectors, multi- or uniprocessor (MP/UP), central/expanded storage, co-processors, channel characteristics, relative instruction execution speeds, etc.) and software/external factors (such as program workload, usage of a program, input to a program, location of a program and data, etc.). Typically, available compilers/optimizers either require that at least some of this information on the ultimate execution environment be known or specified ahead of time, i.e., before producing an assembled code, or simply ignore the unknown environmental factors.
Often compilers do provide options which enable a programmer to manually select how a program should be optimized. However, such optimization options are static mechanisms which do not allow dynamic adjustment for changes in the execution environment, nor do they allow a programmer to be aware of changes in the execution environment. Traditional optimization concepts also include manually tuning programs to a given processor model, and obtaining machine-specific information by, for example, running benchmarks, maintaining processor tables and keying off a processor model, etc. In addition, optimizers, loaders and object modules have been used in association with compilers to optimize program compilation for improved execution performance of the assembled code for a specified processing system.
Commercial advantage is obtained in the highly competitive software development industry to the extent that high-level language code can be optimized for enhanced execution performance. Therefore, optimization techniques capable of optimizing organization of assembled code for conditional constructs and accounting for a program's unknown execution environment are believed significant.