The quality of code produced by compilers has been an issue ever since the first compiler was produced. One of the principal objectives of IBM's FORTRAN I compiler, the first commercially available compiler, was to produce object code in the field of scientific computation which was comparable in code quality to that produced by assembly language programmers.
Today, higher level languages are designed to be used in every field in which computers are applicable. Even the original FORTRAN language has been bolstered to make it applicable to a wide range of programming tasks. However, it is still important that the quality of code produced by the compiler be high, especially if the resultant code is to be used in a production environment. Code produced by a skilled assembly language programmer is still the yardstick against which compiler produced code is measured.
A large number of optimization techniques have been developed and refined since the 1950's to improve the quality of compiler generated code. Indeed, many of these optimizations were known in principle, and used in some fashion by the team that produced the first FORTRAN compiler.
Optimizations that are frequently employed in optimizing compilers include common subexpression elimination, moving code from regions of high execution frequency to regions of low execution frequency (code motion), dead code elimination, reduction in strength (replacing a slow operation by an equivalent fast operation), and constant propagation. Descriptions of these optimizations can be found in:
J. T. Schwartz, On Programming--An Interim Report on the SETL Language. Installment II: The SETL Language and Examples of Its Use. Courant Institute of Mathematical Sciences, NYU 1973, pp. 293-310. PA0 E. Morel and C. Renvoise, Global Optimization by Suppression of Partial Redundancies, CACM Vol. 22, No. 2, pp 96-103, 1079. PA0 A. Aho, J. Ullman, Principles of Compiler Design, Addison-Wesley, 1977.
Global common subexpression elimination and code motion are among the most important optimizations. Measurements have shown that these optimizations have a larger effect on code improvement than any of the other optimizations. Many articles in the literature discuss how to perform this optimization; the first two of the above citations contain excellent accounts of how to determine where in a program, copy of code should be inserted in order to allow original code to become redundant and subject to elimination. These articles also describe how to determine where redundant code exists. The methods depend on the program's flow graph, and a knowledge of certain properties which can be determined by examining basic blocks one at a time. These properties are:
DEX: (downward exposed expressions). The set of computations which if executed at the end of a basic block give the same result as when executed "in place", i.e. where they occur in the basic block.
UEX: (upward exposed expressions). The set of computations which if executed at the beginning of a basic block give the same result as when executed "in place".
THRU: (unaffected computations) The set of computations which if computed at the beginning or end of the basic block would give the same results.
The above mentioned references describe how to perform global common subexpression elimination and code motion on the premise that the above mentioned sets are known for every basic block. In particular, these references describe how to compute the set of computations already available on entry to a basic block, and the set of computations to be inserted at the end of certain basic blocks to achieve the effect of code motion, based on the sets DEX, UEX, and THRU. These computations are well known to those skilled in the art.
Unless care is taken in computing UEX, DEX, and THRU, the commoning and code motion algorithms known in the prior art may only common and/or move the first of a sequence of related computations. For example, consider the code fragment in Table 1: