1. Field of the Invention
The present invention relates to a method and system for compiling a source code program in a host computer language including embedded program statements in a computer language different from the host computer language.
2. Description of the Related Art
A computer source program is typically initially written in a high level computer language, also called source code, comprising descriptive statements of the actions the code will cause the computer to perform. High level computer languages include C++, FORTRAN, COBOL, JAVA(trademark), etc. JAVA is a trademark of Sun Microsystems, Inc. A source program written in such a high level language must be converted into object or machine code, i.e., strings of zeros and ones, which the computer can execute to carry out the steps specified by the program. A compiler program is a computer program that receives as input source code and generates as output object code which may be loaded into the computer memory and executed.
The compiler processes the source code in phases. In the first phase, the lexical scanning phase, the compiler groups the characters of a source program into tokens, which are logically cohesive sequences of characters. During this lexical scanning phase, noise words such as comments and blanks are removed. Next, during a parsing phase, the syntax and semantics of the tokens are checked for errors. A parse tree phase follows where the source statements are converted into a parse tree which describes the syntactic structure of a source statement. A parse tree may be expressed as a syntax tree in which the operators appear as interior nodes and the operands of an operator are the children of the node for that operator.
The parsed tree may then be optimized in manners known in the art to develop the shortest linked lists providing a structure of the code. Another phase of a compiler is the generation of a symbol table. A symbol table is a data structure containing a record for identifiers, e.g., the names of variables, arrays, and functions, and the fields and attributes of identifiers. Next follows an error detection phase and thereafter the code generation phase where target code is generated from the optimized parse tree. The target code may be relocatable machine code or assembly code. The compilation process is described in xe2x80x9cCompilers: Principles, Techniques and Tools,xe2x80x9d by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, (Addison-Wesley Publishing Co., March 1988), which publication is incorporated herein by reference in its entirety.
Oftentimes, a host language, e.g., C++, Java, Fortran, etc., includes embedded statements in another computer language. For instance, database systems and programs are often searched and accessed using Structured Query Language (SQL) statements. A source program could include such SQL statements to access a database from within the source program. However, SQL statements are in a substantially different format from programming statements in the host program language. To allow a programmer to include SQL statements in a source program in a different language, the SQL statements are often separately processed and compiled by a precompiler, also known as a language dependent compiler. A precompiler scans the source code for SQL statements and generates a separate modified source file for the SQL statements. This modified source file is a new version of the original source file including run-time API calls converted from the SQL statements. The modified source files and any additional source files that do not contain SQL statements are compiled using the appropriate host language compiler. The language compiler converts each modified source file into an object module. Precompiler programs to process SQL statements in source programs are described in U.S. Pat. No. 5,230,049, entitled xe2x80x9cProgram Source Code Translator, which patent is assigned to International Business Machines Corporation (xe2x80x9cIBMxe2x80x9d), the assignee of the present patent application, and which is incorporated herein by reference in its entirety.
There is a need in the art for an improved method and system for processing and compiling SQL statements embedded in a host language program.
To overcome the limitations in the prior art described above, preferred embodiments disclose a system and method for compiling a program. A source program including program statements in a first computer language and embedded statements in a second computer language is processed. A determination is made as to whether a program statement is in one of the first and second languages. The statement is lexically scanned and parsed into a parse tree if the statement is in the first language. If the statement is in the second language, then the statement is lexically scanned. Then a plurality of function calls capable of executing the statement are accessed and translated into at least one parse tree. The parsed statements are converted into target code.
In further embodiments, the same parse tree structure and parse tree rules are used to parse statements in the first and second languages.
In yet further embodiments, parse trees are optimized after generating parse trees for each statement in the source program. Code is then generated from the optimized parse trees.
In still further embodiments, the second language is SQL and the function calls are application programming interface (API) function calls.
Two-pass parsing systems employing a separate pre-compiler to generate an intermediate file for the embedded language statements are problematic because of the time required to generate the intermediate file of translated SQL statements, read the source code twice, and develop a precompiler to translate and handle the SQL statements. Further, such two-pass precompilers require the input source file be scanned and processed twice, one for precompilation and the other for translation. Second, the prior art precompiling methods require two copies of lexical scanners, one for the host source code and another for the SQL statements, i.e., the language dependent precompiler. Third, prior art compiling methods use two copies of source files, one with the original SQL statements and one from the precompiler generated modified source output. Fourth, precompilation systems increase the likelihood for error if the user changes statements in the modified source output.
Preferred embodiments provide an improved language processor because the SQL statements are converted to API function calls and then parsed in the same manner that statements in the first language are parsed in the parse tree before being converted into target code. In this way, the embedded language statements are subjected to the same optimization techniques applied to the host language. Instead of commenting out the SQL statements, inserting API function calls, and then generating a separate intermediate file, the preferred embodiments apply the parsing and parse tree to the API function calls. The language compiler then generates the parse tree and executes the function calls as if there are no SQL statements.
Thus, preferred embodiments eliminate the need for a separate precompiling program to process the SQL statements and eliminate the need to generate an intermediate file with the translated SQL statements.