The present invention relates to the translation of programs between languages, and in particular, to the conversion of a program written in one machine language into a flow graph and then into a program in a different language, preferably a high level language (HLL).
As a greater number of computer manufacturers employ more operating systems and programming languages, greater importance must be given to the conversion of programs written for one machine and one operating system into forms that can execute on other machines or other operating systems. For example, when a customer wishes to purchase different hardware or wishes to change operating systems, that customer must often face the complete loss of all of the programs that had been written for the previous machine and operating system. This investment can reach millions of dollars and cause severe disruption in the operation of the customer's business.
Because of the complexity of the programs and the sheer number of such programs that need to be converted, it is impractical to consider converting these programs manually. The task is simply too large for significant human intervention.
Converting programs automatically, however, is a difficult task. If all programs had the same type of structure, then conversion of programs might be relatively easy. However, experience has shown that few programs have the same structure and there is no standard structure for programs. Therefore, a general purpose program translation mechanism must be able to accommodate several different program structures.
The differences in structure, however, cause problems. For example, the language of the program to be translated may allow certain instructions which generate program structures and flows that cannot be easily implemented in the language for which the translation is desired. Assembly language code may be written with complex looping behavior that is not easy to map into HLL code. Loops are defined as code which, when executed, repetitively passes through the same program portion.
Another example of a problem in translating a program from machine language to HLL arises from use of the unconditional branch or GOTO statements. Well-structured programs in HLLs generally avoid the use of such statements because their use often results in programs that are difficult to understand and maintain. Previous work on translating programs from machine language to HLL has not been very successful in eliminating the GOTO structure in the translated code. In some cases, the GOTO structure is added manually during translation, which is undesirable.
A third example of a problem encountered in code conversion is complicated program structure. Often such structure has evolved in programs which have been modified extensively. This is because many programs which have been in use for some time are commonly repaired by the use of "patches." Patches are generally small program fixes which are designed to correct a specific minor problem yet minimize the changes to the entire program code. The problem with patches is that they create difficult program flows, and are thus often difficult to convert. Ironically, the programs which have been used for the longest time are generally the ones most in demand for conversion, but they also often have the most patches.
Typically, the process of converting programs from a machine language to a HLL involves a "decompilation" step. The decompilation step converts a program written in a low level language into a higher order language. A construct often used in decompilation is a control-flow graph. If the original program was well structured, the resulting flow graph is usually reducible, meaning that the flow graph can be simplified using known control structuring rules. If the original program is not well structured, however, the resulting flow graph is often irreducible. This is a problem that conventional techniques for converting between machine and HLLs have not adequately addressed.
One object of this invention is to provide an automated method and apparatus for translating programs from machine language to a HLL which requires only minimal human intervention.
Another object of this invention is to provide such an automated method and apparatus which can accommodate difficult structures such as loops and GOTO statements in the machine language program.
Still another object of this invention is to provide an automated method and apparatus for removing undesirable code structures from the translated program.