This application is based on Patent Application No. 2000-15021 filed Jan. 24, 2000 in Japan, the content of which is incorporated hereinto by reference.
1. Field of the Invention
The present invention relates generally to a method of translating a source code, a recording medium for containing a source code translator program, and a source code translator device, and more particularly to such method, recording medium, and device as mentioned above that may be used in conjunction with a compiler, a preprocessor, an extensible programming language implementation, a source code parsing tool, a source code translating tool, and the like.
2. Description of the Prior Art
(1) Definition of Terms
Before proceeding, the definitions for the terms that are used throughout the specification are provided, as follows:
Abstract Syntax Tree
The internal representation of a source code written in the Structured Programming Language. Representing a source code in the form of the abstract syntax tree makes it easier to parse and translate the source code than treating it as a character string.
Expression
In C, Java and like programming languages, the syntax defines constants, variables, operators, assignment expressions and the like as xe2x80x9cexpressionxe2x80x9d. The left side and right side operands of a binary operator are also defined as xe2x80x9cexpressionxe2x80x9d.
Statement
In C, Java and like programming languages, the syntax defines if-statement, while-statement, compound statement and the like as xe2x80x9cstatementxe2x80x9d.
Expression Statement
This is the term that is used in the specifications of C, Java and like programming languages. Any expression that contains semicolon xe2x80x9c;xe2x80x9d is called xe2x80x9cexpression statementxe2x80x9d. In the syntax, the expression statement is defined as one of the statements.
Separator
The separator or delimiter is a symbol that may be used with any expression to allow the expression to be treated as if it were a statement. In C, C++, Java and like programming languages, the semicolon xe2x80x9c;xe2x80x9d is used as a separator, for example.
Backquote Macro
In some programming languages, that enable the program itself to be operated within a given program, this is the construct that may be used to separate the program itself from that given program. In the Lisp programming language, for example, (+ x 1) is a program code that says xe2x80x9cAdd 1 to xxe2x80x9d. When this program code itself is to be described within a Lisp program, it may be described as (+ x 1). A value may be embedded directly inside a program code written in backquote. For example, in LISP, xe2x80x98(+ x 2, (+ y 1)) means a program code described as xe2x80x98(+ x 2), if y contains a value of 1.
Construct
Component which constructs a source code written in programming language. Variables, constants, operators, assignment expressions, if-statement, while-statement, and the like are called xe2x80x9cconstructxe2x80x9d, for example.
(2) Prior Art
Next, a source code translator program according to the prior art is discussed. Firstly, the following discussion covers the typical conventional source code translator program that is designed to process a source code written in C or like programming language.
By definition, the source code translator program is a program that accepts a source code written in a particular language as input, and performs certain conversion processing against the input source code to produce the corresponding output source code in any language that may be the same as or different from the original language. The source code translator program is also called xe2x80x9cpreprocessorxe2x80x9d. Generally, the source code translator program parses the construct of the input source code, and converts the source code into the internal representation that makes the subsequent translation process easier. In many cases, this internal representation has a tree structure, which is known as the xe2x80x9cAbstract Syntax Treexe2x80x9d.
In the prior art, when a source code containing any expression statements is represented as the abstract syntax tree, the abstract syntax tree may be described in a straightforward way by using nodes that represent the expression statements. Here, the expression statement refers to any expression that is followed by a semicolon xe2x80x9c;xe2x80x9d, which is usually found in C, for example.
Now, the expression statement is discussed in a little more detail.
In the C language, and other languages, such as C++ and Java, that are similar in the syntax rules to C, the statement and the expression are distinguished from each other. For example, xe2x80x9cxxe2x80x9d, xe2x80x9c0xe2x80x9d, xe2x80x9cx+1xe2x80x9d, and xe2x80x9cx=0xe2x80x9d are an expression, respectively, whereas if-statement and while-statement are statements. In C, any expression followed by xe2x80x9c;xe2x80x9d is usually treated as a statement. This may be called the expression statement. For example, xe2x80x9cx=0;xe2x80x9d is an expression statement, which is considered as a statement.
To provide a better understanding of the prior art, consider a specific case for a simple language where the syntax is defined by BNF (Backus-Naur Form), as shown below:
statement ::= if-statement | expression-statement
if-statement ::= xe2x80x9cifxe2x80x9d xe2x80x9c(xe2x80x9cexpressionxe2x80x9d)xe2x80x9d statement
xe2x80x9celsexe2x80x9d statement
expression-statement ::= expressionxe2x80x9c;xe2x80x9d
expression ::= equality-comparison | assignment |
variable | constant
equality comparison ::= expression xe2x80x9c==xe2x80x9d expression
assignment ::= expression xe2x80x9c=xe2x80x9d expression
FIG. 1 depicts the abstract syntax tree that may be represented according to the prior art. FIG. 1 shows how the following program code written in the above simple language,
if (x==0) y=0; else y=1;
may be represented as the abstract syntax tree by using the prior art.
FIGS. 2 through 6 show a recursive algorithm according to the prior art that converts an abstract syntax tree into an appropriate character string for output. Specifically, FIG. 2 is a flowchart showing the steps in a typical prior art procedure for translating the abstract syntax tree into a character string. FIG. 3 is a flowchart showing the steps in a typical prior art procedure for producing an if-statement as output. FIG. 4 is a flowchart showing the steps in a typical prior art procedure for producing an xe2x80x9cexpression statementxe2x80x9d as output. FIG. 5 is a flowchart showing the steps in a typical prior art procedure for producing an xe2x80x9cequality comparisonxe2x80x9d as output. FIG. 6 is a flowchart showing the steps in a typical prior art procedure for producing an xe2x80x9cassignmentxe2x80x9d as output. Now, those procedures are described below as they are related to the abstract syntax tree shown in FIG. 1.
The process starts with invoking the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d shown in FIG. 2. The abstract syntax tree being processed corresponds to the abstract syntax tree T11 in FIG. 1. The procedure in FIG. 2 takes a branch, depending on the type of the root node in the abstract syntax tree being processed. Since the root node in the abstract syntax tree T11 is if-statement, a branch occurs to invoke the xe2x80x9cProcedure for Producing xe2x80x9cifxe2x80x9d Statementxe2x80x9d(Step S21).
In the xe2x80x9cProcedure for Producing xe2x80x9cifxe2x80x9d Statementxe2x80x9d in FIG. 3, a string xe2x80x9cif (xe2x80x9d is first produced (Step S31). To output xe2x80x9cx==0xe2x80x9d contained in the first occurrence of the subtree, the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d in FIG. 2 is recursively invoked (Step S32). Then, an output string of xe2x80x9c)xe2x80x9d is obtained (Step S33). Then, to output xe2x80x9cy=0;xe2x80x9d contained in the second occurrence of the subtree, the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d in FIG. 2 is recursively invoked (Step S34). Then, an output string xe2x80x9celsexe2x80x9d is obtained (Step S35). Finally, to output xe2x80x9cy=1;xe2x80x9d contained in the third occurrence of the subtree, the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d in FIG. 2 is recursively invoked (Step S36). Then, the procedure in FIG. 3 ends, returning to the step that invoked this procedure.
Specifically, the steps S32, S34 and S36 recursively invoking the procedure in FIG. 2 are described. In the following, the procedure for producing xe2x80x9cy=1;xe2x80x9d in Step S36 is used as an example, and is described in further details. (Note that the procedure for producing xe2x80x9cx==0xe2x80x9d in Step S32 and the procedure for producing xe2x80x9cy=0;xe2x80x9d in Step S34 are not described, as those procedures are recursively performed in the same manner as that in Step S36.) Step S36 recursively invokes the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d in FIG. 2. Here, the abstract syntax tree being processed is named as xe2x80x9cT12xe2x80x9d. Since the type of the root node in the abstract syntax tree being processed is an expression statement, a branch is taken, invoking the xe2x80x9cProcedure for Producing an Expression Statementxe2x80x9d (Step S22).
In the xe2x80x9cProcedure for Producing an Expression Statementxe2x80x9d in FIG. 4, to output xe2x80x9cy=1xe2x80x9d contained in the subtree, the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d in FIG. 2 is recursively invoked (Step S41). Then, an output string xe2x80x9c;xe2x80x9d is provided (Step S42). Then, the procedure in FIG. 4 ends, returning to the step that invoked this procedure.
Now, the procedure for producing xe2x80x9cy=1xe2x80x9d in Step S41 where the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d is recursively invoked is discussed in further details. Step S41 is provided for recursively invoking the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d in FIG. 2. Here, the abstract syntax tree being processed is named as xe2x80x9cT13xe2x80x9d. Since the type of the root node in the abstract syntax tree being processed is an assignment, a branch is taken, invoking the xe2x80x9cProcedure for Producing an Assignmentxe2x80x9d (Step S24).
In the xe2x80x9cProcedure for Producing an Assignmentxe2x80x9d in FIG. 6, to output xe2x80x9cyxe2x80x9d contained in the first occurrence of the subtree, the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d in FIG. 2 is further recursively invoked (Step S61). Then, an output string xe2x80x9c=xe2x80x9d is produced (Step S62). Then, to output xe2x80x9c1xe2x80x9d contained in the second occurrence of the subtree, the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d in FIG. 2 is recursively invoked (Step S63). Then, the procedure in FIG. 6 ends, returning to the step that invoked this procedure.
Next, the procedure for producing xe2x80x9cyxe2x80x9d in Step S61 where the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d is recursively invoked is discussed in further details. Step S61 is provided for recursively invoking the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d in FIG. 2. Here, the abstract syntax tree being processed is named as xe2x80x9cT14xe2x80x9d. Since the type of the root node in the abstract syntax tree is a variable, a branch occurs, where a variable name xe2x80x9cyxe2x80x9d is produced (Step S25). Then, the procedure in FIG. 2 ends, returning to the step in FIG. 6 that invoked this procedure.
Then, the procedure for producing xe2x80x9c1xe2x80x9d in Step S63 where the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d is recursively invoked is discussed in further details. Step S63 is provided for recursively invoking the xe2x80x9cProcedure for Outputting an Abstract Syntax Treexe2x80x9d in FIG. 2. Here, the abstract syntax tree being processed is named as xe2x80x9cT15xe2x80x9d. Since the type of the root node in the abstract syntax tree being processed is a constant, a branch occurs where a constant name xe2x80x9c1xe2x80x9d is produced (Step S25). Then, the procedure in FIG. 2 ends, returning to the step in FIG. 6 that invoked this procedure.
Through the recursively invoking steps performed as described above, the abstract syntax tree T11 of FIG. 1 is processed to provide a string xe2x80x9cif (x==0) y=0; else y=1;xe2x80x9d.
By using the internal representation of the abstract syntax tree containing expression statements represented by nodes as has been described so far in connection with the prior art, there is a problem in that the processing by the source code translator program becomes complicated, and there is another problem in that the work of describing a backquote macro becomes complicated. Those two problems are discussed below more clearly.
The first problem of the prior art is first discussed. The source code translator program accepts a program code as input, parses the construct of the program code, and turns it into an internal representation in the form of an abstract syntax tree. Then, it performs the conversion processing for the abstract syntax tree, which is finally converted back to a character string that is provided as output. When programming the conversion processing for the abstract syntax tree, programmers must always be aware of the distinction between a statement and an expression. This raises another problem.
For example, suppose that programmers failed to make this distinction between the statement and expression, and have inadvertently built an abstract syntax tree in which an expression appeared in the place where a statement should have appeared. An example of such abstract syntax tree is illustrated in FIG. 7. (Note that this also shows the abstract syntax tree using the representation according to the present invention, which will be described later.) As shown, the second and third occurrences of the subtree where if-statement appears should have contained a statement, but the subtree representing an assignment, which is one type of the expression, actually appeared in the abstract syntax tree of FIG. 7. If this abstract syntax tree is output as a character string using the procedures in FIGS. 2 through 6 of the prior art, it would result in a wrong program in which a separator xe2x80x9c;xe2x80x9d is lacking, as shown below.
if (x==0) y=0 else y=1
Similarly, if an abstract syntax tree built by programmers should inadvertently contain a statement that appeared in the subtree of the expression statement, the resulting program would contain a double separator xe2x80x9c;xe2x80x9d, as shown below.
if (x==0) y=0;; else y=1;;
In order to avoid that such erroneous programs be produced, the programmer who is responsible for writing a source code translator program must be aware of the distinction between the statement and expression when handling the abstract syntax tree. Thus, the work of writing such programs would become complicated.
Next, the second problem of the prior art is discussed. For example, the Lisp language provides the function called the xe2x80x9cbackquote macroxe2x80x9d (Backquote macro is also known as quasiquote). Here, the backquote macro refers to the function that allows the program itself to be treated as data, and is required when operating on the program using the language for that program. The following is an example of how a program code may be described using the backquote macro in Lisp:
xe2x80x98(if condition ,exp1 ,exp2)
There are several existing systems in which this backquote macro function is additionally implemented on C language, but any of those systems has a common problem in that the description using the backquote function is not as easy as for the Lisp language. This is because, in Lisp, the individual items contained in the syntax are all expressions, which makes it easier to describe, whereas in C, the expression and statement must be separated distinctly, which makes it more difficult to describe. The following is another example of how a program code may be described using the backquote macro:
xe2x80x98[Statement] {if (condition) ,[Expression]exp1;else,[Expression]exp2;}
To ensure that the construct of the backquote macro is parsed accurately, it is required that a non-terminal sign (in this case, statement) at the time of start, and a non-terminal sign (in this case, expression) at the time when a value is embedded must be specified. It may be understood from the above that the programmer""s task of describing a program code by using the backquote macro would become complicated, which would reduce the practical utility of the backquote macro.
In light of the problems of the prior art that are associated with the source code conversion processing as well as the source code description using the backquote macro, the present invention provides a method of translating a source code, wherein an abstract syntax tree may be described without using nodes representing expression statements, so that such abstract syntax tree may be converted into a character string as appropriate.
So the object of the present invention is to meet the following requirements, that is,
(a) To eliminate the need of being aware of the distinction between the expression and statement when building an abstract syntax tree, and to thereby make it easier to implement a source code translator program; and
(b) To simplify the construct of the backquote macro. For example, this includes ensuring that the equivalent of the program code described using the backquote macro in the Lisp language, as given below,
xe2x80x98(if condition ,exp1 ,exp2)
can be written in C, as follows:
xe2x80x98{if (condition) ,exp1;else,exp2;}
Thus, programmers can write any program code more easily, without worrying about the need of distinguishing between the statement and expression, as it usually occurs in the prior art.
According to one aspect of the present invention, there is a method of translating an input source code described in a particular programming language that meets the following requirements, into a corresponding output source code in any language that may be the same as or different from the original language, the requirements being that:
a statement and an expression be distinguished according to the syntax rules;
any expression statement containing an expression followed by a separator be defined as one of the statements;
for each construct, whether it is an expression or a statement be predefined; and
for each construct, whether each of the items comprising the construct is an expression or a statement be able to be determined, wherein so that an input source code represented as an abstract syntax tree without using nodes corresponding to separators representing expression statements may be produced as a corresponding output source code, the method comprises the steps of:
outputting the abstract syntax tree as a statement;
outputting the abstract syntax tree as an expression; and
producing each corresponding construct, wherein
said step of outputting the abstract syntax tree as a statement depends on the type of the root node in the abstract syntax tree being output, and includes the steps of:
if the root node is a node representing a statement, producing each corresponding construct according to the type of that statement; and
if the root node is a node representing an expression, outputting the abstract syntax tree as an expression and then producing a separator representing an expression statement, wherein
said step of outputting the abstract syntax. tree as an expression depends on the type of the root node in the abstract syntax tree, and includes the step of producing each corresponding construct according to the type of that expression.
According to another aspect of the present invention, there is a recording medium for containing a source code translator program for translating an input source code described in a particular programming language that meets the following requirements, into a corresponding output source code in any language that may be the same as or different from the original language, the requirements being that:
a statement and an expression be distinguished according to the syntax rules;
any expression statement containing an expression followed by a separator be defined as one of the statements;
for each construct, whether it is an expression or a statement be predefined; and
for each construct, whether each of the items comprising the construct is an expression or a statement be able to determined, wherein so that an input source code represented as an abstract syntax tree without using nodes corresponding to separators representing expression statement may be produced as a corresponding output source code, the source code translator program comprises the steps of:
outputting the abstract syntax tree as a statement;
outputting the abstract syntax tree as an expression; and
producing each corresponding construct, wherein
said step of outputting the abstract syntax tree as a statement depends on the type of the root node in the abstract syntax tree being output, and includes the steps of:
if the root node is a node representing a statement producing each corresponding construct according to the type of that statement; and
if the root node is a node representing an expression, outputting the abstract syntax tree as an expression and then producing a separator representing an expression statement, wherein
said step of outputting the abstract syntax tree as an expression depends on the type of the root node in the abstract syntax tree, and includes the step of producing each corresponding construct according to the type of that expression.
According to a further aspect of the present invention, there is a source code translator device for translating an input source code described in a particular programming language that meets the following requirements, into a corresponding output source code in any language that may be the same as or different from the original language, the requirements being that:
a statement and an expression be distinguished according to the syntax rules;
any expression statement containing an expression followed by a separator be defined as one of the statements;
for each construct, whether it is an expression or a statement be predefined; and
for each construct, whether each of the items comprising the construct is an expression or a statement be able to be determined, wherein so that an input source code represented as an abstract syntax tree without using nodes corresponding to separators representing expression statements may be produced as a corresponding output source code, the source code translator device comprises:
means for outputting the abstract syntax tree as a statement;
means for outputting the abstract syntax tree as an expression; and
means for producing each corresponding construct, wherein
said means for outputting the abstract syntax tree as a statement depends on the type of the root nodes in the abstract syntax tree being output, and,
if the root node is a node representing a statement, produces each corresponding construct according to the type of that statement; and
if the root node is a node representing an expression, outputs the abstract syntax tree as an expression and then produces a separator representing an expression statement, wherein
said means for outputting the abstract syntax tree as an expression depends on the type of the root node in the abstract syntax tree, and produces each corresponding construct according to the type of that expression.
The source code translating method of the present invention is provided for solving the problems of the prior art that are encountered when the conversion processing occurs for a given source code, and when source code is described using the backquote macro. According to the method of the invention, any source code may be described in the form of an abstract syntax tree without using any node representing the expression statement, and such abstract syntax tree may then be converted into an appropriate output character string.
Therefore, the present invention provides the following advantages in that:
1. programmers can build an abstract syntax tree without having to be aware of the distinction between the expression and statement. This makes it easier to implement the source code translator program.
2. the syntax of the backquote macro can be simplified. For example, the equivalent of the following description using the backquote macro in the Lisp language,
xe2x80x98(if condition ,exp1 ,exp2)
may be written in C, as follows:
xe2x80x98{if (condition) ,exp1;else,exp2;}
As programmers can now write any source code without having to worry about the distinction between the expression and statement, as opposed to the prior art, they can describe the source code more easily.
The above and other objects, effects, features and advantages of the present invention will become more apparent from the following description of embodiments thereof taken in conjunction with the accompanying drawings.