At the advent of modern computer programming era computer programs were typically so small in size that all the symbolic code for a single application could easily fit into a few screen shots mostly due to limitations set by reasonably low capacity memory chips used in low-end computers at that time. Memory space intended for free use by the applications could consist of some kilobytes only, thanks to high pricing of the memory chips. In the aforesaid type of context, it is evident that programming could be performed by a single person utilizing only a single source code file as program maximum size was very limited; no real benefits would have been gained from distributing either program functionalities to several files or programming work between several persons during the design phase.
As the memory capacity of computers eventually started to rise steeply, also programming techniques had to be developed further in order to exploit the opened up possibilities. Programmers noticed that certain parts, e.g. clearly separable instruction sequences like algorithms with distinct input and output parameters, of a program could also be used in some other applications later. In addition, the program should be differentiated to a number of parts to be handled advantageously simultaneously by experts of particular fields thereof to speed up the overall development work.
The obvious solution to deal with the problem is to divide the software under development into several modules the components of which carry some common characteristic factor. The modules are then compiled separately and linked together after compilation to form the final executable program. Although most compilers are able to perform some optimizations on the generated code, even modern contemporary software developing tools still provide quite limited code optimisation functionalities to the user; the range of the optimization is limited to a module or—more frequently—to a function inside the module. Generally, linkers that merge together the compiled modules do not perform further optimizations on the code. Therefore, factual global optimizations concerning all the program parts together are not performed.
Since merging everything into a single source file is usually not an alternative on the basis of the above considerations, a sensible solution—by provision that an internal representation is not already available—is to analyze the program and construct a control flow graph from it.
Cristina Cifuentes et al, see reference 1, have worked on binary code analysis. Their goal was to retrieve high language statements from binary code. The results were tested on Intel Pentium and Sun SPARC machines. Executable is not created from the CFG. Saumya K. Debray et al, see reference 2, have created a post-link time optimizer specifically for Alpha processors.
Building a CFG for a binary program is not a trivial problem, since high-level control structures usually do not exist at a binary code level. Moreover, there may be data intermixed with instructions, both of which should be handled separately. Additionally, situations can occur where the precise flow of control cannot be calculated because of indirection. Also, the integrity of address references needs to be preserved in order to be able to produce executable code after the optimizations have been performed. Finally, there are processors with multiple instruction sets; before an instruction can be analysed, the corresponding instruction set has to be determined.