1. Field of the Invention
The invention relates generally to the verification of computer code and, more particularly, to the verification of a code sequence.
2. Related Art
Verification allows computer code from an unknown and potentially untrusted source to be executed on a trusted computer platform avoiding certain classes of potentially serious execution errors. Verification is typically not required for commercially distributed software that is certified by the manufacturer or distributor for xe2x80x9csafexe2x80x9d execution on a particular computer platform. Similarly, computer code distributed in source form and compiled locally (i.e., on the machine on which it is to be executed), typically does not require verification as error detection can be performed as part of the compilation process.
However, the increased availability of portable compiled code brought about by the increasing popularity of the Internet and the World Wide Web has made code verification an important concern for on-line users. The JAVA programming language, for example, is intended to provide single source computer code executable on a variety of hardware platforms, provided that a JAVA Virtual Machine (JVM) is present on the platform to interpret JAVA bytecodes into the instruction set of the specific hardware platform. While such characteristics make JAVA an ideal platform for code distributed over the Internet, they also render verification of code downloaded over the Internet a primary concern to ensure users"" security on individual platforms.
While verification can be performed at run time, the resulting performance penalty is undesirable for most purposes. Sun""s solution to JAVA bytecode verification relies on a technique known as symbolic execution. Symbolic execution, however, presents a number of limitations. First, all possible execution paths in the code must be verified, just in case that portion of the code is exercised at run time. Second, symbolic execution maintains little or no state information about the structure of the code being verified. As a result, when multiple execution paths split and rejoin all possible states of each of the paths must be accounted for. Similarly, loops need to be symbolically executed multiple times if invoked in different code sequences. As a result, symbolic execution is hard to implement and is inherently dependent on a specific bytecode instruction set.
In addition, the only information derived from the symbolic execution process is whether a specific bytecode sequence is accepted or rejected.
The system and method of the present invention allow for improved code sequence verification through the use of an abstract syntax tree. This is accomplished by first constructing an abstract syntax tree from the code sequence and then determining whether the abstract syntax tree satisfies a predefined set of conditions indicative of the code sequence being executable on the computer without generating a predefined class of execution errors.
The abstract syntax tree is constructed by reassembling the code sequence into a plurality of instructions, combining the instructions into a plurality of blocks, examining the blocks to determine entry points of a plurality of loops, and tagging locations in the series of instructions where control is transferred at the end of each loop. The instructions, blocks, loops and tagged locations are then examined to generate a plurality of control structures (the coarse structure). Finally, the instructions, blocks, loops, tagged locations and control structures are examined to generate a plurality of form expressions (the fine structures).
Since the abstract syntax tree closely approximates the structure of the source code from which the code sequence was generated, the system and method of the present invention overcome the limitations of the prior art. Once the abstract syntax tree has been constructed, verification can be conducted in a more straightforward manner. Since the abstract syntax tree provides more extensive state information, only the tree and not every single path through the code need to be verified. Furthermore, verification on the abstract syntax tree is largely independent from the specific bytecode instruction set.
In addition, since the abstract syntax tree provides the structure of the code sequence, code segments can be inserted into the code sequence to perform additional functions at run time without requiring access to the source code or recompilation.