Nowadays, according to the ongoing trend in the software development industry, a single computer program is typically constructed as a multi-element structure including several modules taking care of the differentiated functionalities the overall program product is planned to offer to the user. The modules are independently compiled and then finally linked together after compilation to form the executable. Although most compilers are able to perform optimizations on the generated code, the range of the optimization is limited to a module or—more frequently—to a function. Generally, linkers that merge together the compiled modules do not perform further optimizations on the code. Thus, any global optimizations are not performed.
Since merging everything into a single source file is usually not a feasible alternative even if standard module specific optimisation techniques could be then applied thereto, as the program maintainability and code re-usability is lost, the only acceptable solution—provided that an internal representation is not already available—is to analyze the program and build a corresponding control flow graph (CFG), which is the basis of most, if not all, optimization techniques at post-link time.
Building a CFG of a binary program is not a trivial task because high-level control structures usually do not exist on the binary code level. However, a few techniques still exist for performing the task with adequate precision. Cristina Cifuentes et al., see reference 1 (at end of specification), have worked on binary code analysis. Their goal was to retrieve high language statements from binary code. The results were tested on Intel Pentium and Sun SPARC machines. An executable is not created from the CFG. Saumya K. Debray et al., see reference 2, have created a post-link time optimizer specifically for Alpha processors.
FIG. 1 discloses a partial CFG constructed from an executable program. It includes several levels of hierarchy, namely root level 102, section level 104, 112, function level 106, and basic block level 108. For example, whenever there is a section node in the COFF (Common Object File Format) file containing the executable (or in a binary executable of some other type) that is loaded, a corresponding section node is inserted in the CFG during the construction thereof. In every text section 104 a function node 106 exists for every non-label symbol in the COFF file, and every function comprises basic blocks 108. Basic blocks 108 comprise instructions 110 (optionally data as well) that may refer 118 to data elements 114 that are included in a separate data section 112. There are edges 116, 118 of various types in the CFG, which connect nodes. For example, call edges 116 represent calls of a function from another function and address edges 118 connect instructions or data nodes to functions, basic blocks or other instructions or data nodes. The presented CFG is an example only and several varying possibilities exist for creating flow graphs with slightly different hierarchy, building blocks or edges still utilizing substantially the same basic principles.
In general terms, compaction refers to modification of binary executable programs in such a way that they retain their original behavior with a smaller memory footprint. If a CFG is available, the transformations may be applied on the graph, thus decreasing its size but maintaining the meaning thereof and finally synthesizing a new executable from the graph. Compaction is very useful in areas where memory footprint is essential, e.g. embedded systems. Procedural abstraction is a compaction technique where multiple instances of identical instruction sequences are replaced by calls to a function containing the same instruction sequence extracted from the original location. However, currently used procedural abstraction techniques either work only on whole basic blocks in the CFG or even when they are able to abstract out parts of basic blocks they do no analyse the tails of instruction sequences for further optimization.
Debray et al., reference 2, describe an algorithm to abstract out blocks to procedures, but the described technique abstracts only whole blocks. The time complexity of the algorithm is O(n^2) where n is the number of basic blocks.
Kim and Lee, see reference 3, describe a graph based algorithm to determine which parts of a basic block to abstract out to achieve maximum compaction, but the solution presented in the paper does not take the possibility of merging the tails of abstracted instruction sequences into account. The time complexity is O(n^3).