The normal processing steps within a compiler are: use parsing and lexical analysis to create a `parse tree` from the `source language`; (optionally) re-organise that parse tree to reduce unnecessary computation (`optimisation`); and a final, single pass of the parse tree to generate code in the `target` language(s).
A problem with this approach is that, once code generation is started, each parse tree node is most conveniently translated into a single standard program structure in the target language(s). If only a single program structure or pattern is available, then that pattern must be very general to function correctly for every possible invocation. Such a general form typically performs slowly. This problem is exacerbated if each node can only be translated into one of several target languages.
If this approach is inadequate, then non-local information (that is, information from nodes elsewhere in the parse tree) must be available to make generally complex decisions as to which is the best pattern to use. This may involve information from parse tree nodes remote from the parse tree currently being processed, and complex, time-consuming and error-prone decision-making processing on a node-by-node basis.
This decision-making process becomes particularly difficult when code is emitted in two or more target languages, where code corresponding to some nodes can be emitted only in one language (for example, a data access and query language, such as SQL), for some other nodes, code can only emitted in another language (for example, a general purpose programming language, such as C), and for yet other nodes, either language can be used.
Further difficulties arise when one or more of the target languages have several different ways of expressing very similar processing, where one approach is fast, but can only be used in limited circumstances, while other approaches are more general, but typically execute more slowly. For example, in SQL, the choice of whether to use a cursor or inline data access code is particularly difficult to make.
What follows is a definition of terms that will be used in the following description of the present invention and specific examples thereof.
Code generation (`emitting code`): the process of traversing a parse tree (q.v.) and generating a program in one or more lower-level target languages (q.v.).
Implementation Language: the programming language(s) in which a compiler itself is written. The implementation language need not be the same as either the source language (q.v.) or target language(s) (q.v.).
Parse tree: a graph, usually a `tree`, or a `directed acyclic graph`, of parse tree nodes (q.v.) which represent the semantics and behaviour of a computer program in the source language (q.v.), or a portion thereof.
Parse tree node (`node`): part of a parse tree (q.v.), representing a single operation or function in the source language (q.v.).
Parsing: the process of converting a textual representation of a programming language (source language) into a parse tree (q.v.).
Source Language (`source code`): The input language to a compiler. Usually, the source language for a compiler is a high-level language, such as a business modelling language.
Target Language (`target code`): The output language from a compiler. Usually, the target language(s) of a compiler are a low-level language, such as assembly code, C or SQL. Often, the target language(s) may be further processed; for example, by another compiler or assembler.