This invention relates to automated translation between high-level computer programming languages.
This invention relates particularly to improved preservation in a target high-level language of preprocessor characteristics (such as macros, source file inclusion structure, and commentary) contained in a source high-level language. A feature of this invention is that preprocessor characteristics need not necessarily be processed by a preprocessor.
High-level computer languages enable computer programmers to communicate with computers. Statements programmers write in a computer language form a computer program which in turn instructs a computer to perform a set of tasks. "Compilation" is the manner in which high-level computer language programs are converted into instructions, generally called machine code, which the computer can understand and execute. A compiler is a computer program which performs this translation.
In general, each brand of computer understands a different set of machine code instructions. Therefore, a different compiler must exist for each computer to translate a high-level computer language. Because compilers for every high-level computer language do not exist on every brand of computer, not every program can execute on every machine. Programmers can only write programs in the languages for which compilers exist for their target computers.
Nonetheless, it is highly desirable to have a single computer program run on as many brands of computers as possible. Application programs are typically complex and difficult to write; rewriting programs in multiple languages to run on multiple brands of computers is impractical. Likewise, compilers are difficult to write; providing them for every language for every brand of computer is equally impractical. One way of addressing these problems has been the development of well known, widely used, standardized high-level languages. Compilers for these languages are available for a wide variety of computers.
The development of standardized languages has not been a complete solution. There exist numerous high-level languages, and many large programs written in them, which are exotic, highly specialized, little used, obsolete, or designed for specific computers. Many computers do not have compilers available for these languages.
Because many high-level computer languages, whether or not they are standardized, cannot be compiled on every computer, programs have to be translated to other languages. While translation can be done by hand, it is a laborious, time consuming, and expensive process prone to error. To address this problem, automatic translators have been and continue to be developed to translate programs written in one high-level language to another.
Automatic translators may be used in either of two distinct strategies to solve the problem of an unavailable compiler for a particular language on a particular computer. First, programmers may continue to write and maintain programs in the original source language. The translator converts these programs into intermediate code in a target language. An available compiler for the target language then converts this intermediate code into machine code which the target computer can understand. Although the target language is usually a standard widely available language, the translator does not have to produce readable or maintainable source code.
The second strategy requires a translator to produce readable and maintainable code. Programmers going this route want to abandon the original language in favor of the target. Building this type of translator is a more difficult task and is the focus of this invention.
Prior art attempts to build translators which produce readable code have had differing goals and various levels of success. Syntax of one high-level language has been successfully transformed into syntax of another high-level language. Some translators have produced attractively formatted target code. While source code comments have been migrated to target code, their placement has not always been optimal. Translators have also attempted to transform the style of programs to make them more readable. Others have used knowledge-based systems to extract the meaning of the source program and rewrite it in the target language.
However, prior art translators have universally failed adequately to preserve programming constructs generally known as preprocessor characteristics. Many high-level languages include a preprocessor language separate from but coexisting with the language itself. Characteristics (which are also herein referred to as invocation expressions) of the preprocessor language may include a conditional compilation mechanism, a macro mechanism, a source inclusion mechanism, a variety of compiler directives, and a comment mechanism. At the risk of oversimplification, the preprocessor allows programmers to use shorthand expressions for longer constructs. Thus, invoking the shorthand expression triggers a text substitution when the source code is run through the preprocessor.
The failure of translators to handle adequately preprocessor characteristics is a well known problem. Experts in the field, when considering how to replace a source language macro definition with a target language macro definition, have stated: "We do not know of a general, fully automatic mechanism for achieving this replacement." Two suggestions have been made:
Attempt to parse macro bodies outside of their context of use, with the hope that some well-formed and complete source language constructs will translate them correctly. PA1 Recognize common sequences of tokens in macro bodies via pattern matching.
The former is not a general solution because languages may make no restrictions for using macros, and such restrictions, in any event, would make macros susceptible to semantic errors. The latter suggestion would only work in special cases.