In general, there are two types of code: compiled code and interpreted code. Compiled code has two general categories. The first category compiles source code into object code and then links one or more object codes together to create an executable, which is executed at run-time. The second category compiles the source code into an intermediate language, which under-goes just-in-time compilation at run-time to create native code that is executed. Hereinafter, the executable and the native code are both referred to as executable code.
In order to assure that the executable code does not harm a computer system, the executable code and its associated source code undergo extensive testing and review. While this minimizes the amount of executable code that is harmful, some harmful executable code still exists. Some of the harmful executable code is unforeseen, while other harmful executable code is specifically written to cause harm to computer systems (e.g., viruses, worms, and Trojan Horse attacks). Fortunately, several mechanisms have been developed to further minimize the risk of harmful code. One mechanism is a feature provided by a processor for recognizing pages in memory as executable or non-executable. Because the operating system directly configures the executable code in memory, the operating system may mark certain memory pages as executable and others as non-executable. Then, when an instruction in the executable code attempts to execute code in the memory page marked as non-executable, the processor throws an exception. This prevents stray pointers and malicious code from harming the information in the memory that is marked as non-executable.
Unfortunately, this feature of the processor is not available when processing interpreted code. Interpreted code is processed at run-time via an interpreter. The interpreter is responsible for processing the interpreted code into commands that the processor can execute. Conceptually, the interpreter operates in a serial manner, inputting a string and interpreting the string into a command. The command is associated with a set of executable instructions that perform the command when executed by the processor. Because the operating system does not manage the memory for interpreted code, the operating system is unable to mark pages in memory as non-executable or executable. From the processor's perspective, it is executing the interpreter software module (i.e., the interpreter) that has been loaded into memory and is being managed by the operating system. The interpreter software module is responsible for processing the interpreted code. In other words, the interpreted code is viewed as “data” to the processor. Thus, security problems arise when interpreted code (e.g., a script) contain “data” that is interpreted into harmful commands (e.g., format c:).
While there are various ways in which a harmful command may be “inserted” into an otherwise useful and harmless script, one way is via an input file. For example, a script may input a text file containing several lines. Each line may list a user's name. The script may then specify a command using each user's name. In this example, the harmful command may be “inserted” by editing one of the lines in the script and appending a malicious string (e.g., format c:) after one of the user's names. Because the interpreter “interprets” its input into commands, the interpreter will interpret the malicious string into the “correct”, but harmful, command. Then, when the “correct” command is executed, undesirable and/or harmful actions occur.
One way to minimize the security problems associated with scripts is to have the scripts and any data that is input into the script undergo a formal review and testing procedure similar to source code for compiled code. However, this solution is not ideal, and may not even be attainable.
Thus, until now, an adequate solution for minimizing security problems with scripts in an interpretive environment has eluded those skilled in the art.