The present invention relates to computer programs written in the JAVA language.
It has been estimated that over 80% of web application are vulnerable to cyber attack, and much of the danger arises from a class of application vulnerabilities called command-injection vulnerabilities.
Taint tracking is a prior art technique that aims to mitigate, or alert the potential occurrence of, command injection vulnerabilities, exploits, or attacks. Taint tracking works by labeling untrusted input arriving into an application program from an untrusted source (such as a HTTP request header), and follows that untrusted data as it moves through the application program to watch out should that untrusted data be passed as a parameter to a sensitive or vulnerable method (such as a Structured Query Language (SQL) query to be executed by a database)
Prior art taint tracking consists of three main steps:                (i) The first step is to identify untrusted input at the point that it enters the program and mark that it is untrusted (i.e., tainted). This is called “source identification” or “source tainting.”        (ii) The second step is to propagate taint information as subsequent computation occurs, marking as tainted all data that is derived from an untrusted source. For example, if part of the tainted data is used to create a new variable, that variable also becomes tainted and is subsequently tracked as well.        (iii) Finally, all data going into a sensitive data sink (e.g., a database, response output, or file) is checked, using the taint information to identify potential attacks.        
Prior art runtime taint tracking techniques can be implemented at several levels, affecting the instrumentation precision, overhead, and level of implementation difficulty. Most prior art taint-tracking techniques for the Java Platform are implemented as either bytecode-level (application) instrumentation, or library-level (runtime environment) instrumentation.
Bytecode-Level Instrumentation
Bytecode-level instrumentation of application code is one technique used to implement command-injection protection for Java applications. This prior art technique operates by instrumenting the bytecode of an application, either statically by modifying the contents of a CLASSFILE or JAR file on a hard disk, or dynamically by modifying the bytecode of the application program classfile as it is loaded into the JVM or executed by the JVM. Regardless of which arrangement is used, this instrumentation technique aims to modify the application code, and optionally the Java API class libraries, by enhancing them with additional functionality, behaviours, or operations that can be used to implement taint-tracking in the JVM.
The disadvantages of the prior art bytecode-level instrumentation techniques, are that instrumenting core Java API classes (like java.lang.Object or java.lang.String) may not be possible, or the act of doing so may introduce instability and bugs into the JVM or Java API class libraries that are operating the application program. As a result, bytecode-level instrumentation techniques for command-injection protection of Java applications have not achieved widespread commercial adoption or use.
Examples of bytecode-level instrumentation techniques are described in the publications “WASP: Protecting web applications using positive tainting and syntax-aware evaluation” by Halfond, Orso, and Manalios (IEEE Transactions on Software Engineering (TSE), 34(1):65-81, 2008), and “Dynamic Taint propagation: Finding vulnerabilities without attacking” by Chess and West (Information Security Technical Report, 13(1):33-39, 2008).
Library-Level Instrumentation
Library-level instrumentation of Java API class libraries is another common command-injection protection technique for Java Applications. Library-level instrumentation works by modifying, such as by rewriting and redeploying, the (standard) Java API class libraries that come bundled with the NM being used to run the application program.
The principal advantage of the library-level technique is that it can potentially achieve a lower performance overhead than bytecode-level techniques. However, the disadvantage is that tweaking with standard libraries often reduces the stability of the JVM, the Java API class libraries, or the application execution, as changing the Java API class libraries violates unstated invariants that the runtime, the application, and/or the JVM expect. As a result, library-level instrumentation techniques for command-injection protection of Java applications have not achieved widespread commercial adoption or use.
One example of the danger and complexity of the library-level technique, is modifying, instrumenting, or otherwise changing the class definition or implementation of java.lang.Object or java.lang.String. Even slight modifications to some of the methods of these, or other of the Java API class library classes, can lead to surprising failures at runtime. There is generally no way to determine this ahead of time, other than testing for compatibility with the different runtimes and its versions, such as the many versions of NM from Oracle, IBM, and other vendors.
Examples of library-level instrumentation techniques are described in the publications “Efficient character-level taint tracking in Java” by Chin and Wagner (Proceedings of the ACM workshop on Secure web services, SWS '09, pages 3-12, 2009). This example implementation of library-level taint tracking works by modifying classes of the Java API class libraries used by the JVM (such as for example java.lang.String), in order to implement their library-level taint-tracking technique.
It would be desirable to improve upon bytecode-level instrumentation arrangements by omitting the second step of taint information propagation. In addition, it would be desirable to improve upon library-level instrumentation arrangements by obviating the requirement to modify the Java API class libraries, or any other part of the JVM that ships with the JVM and that is to run an application. Together, these improvements over the prior art provide significant benefits to Java application administrators and operators who desire a command-injection protection solution for their Java Applications but are either unwilling or unable to permit such a solution to modify either, or both of, the (standard) Java API class libraries that are provided with the JVM, or any other part of the Java Virtual Machine.
In practice then, the reliance on modifying the standard libraries of the prior art approaches, renders these prior art taint-tracking approaches infeasible for the majority of real-world Java applications in production.
The present invention discloses a method for protecting Java applications from command-injection vulnerabilities, exploits, or attacks, which can be applied and operated on unmodified NM software with unmodified Java API class libraries, without requiring the instrumentation or modification of this software or these libraries, by employing a garbage-collected tainted-value-cache which operates in a coordinated manner with a JVM's garbage collector.
The use of the term “Unmodified JVM Software” is to be understood to mean JVM software which is received from a JVM vendor, and is not changed after receipt by instrumentation, transformation, recompilation, or similar means.
The use of the term “Unmodified Java API class library(ies)” is to be understood to mean the Java Application Program Interface (API) class libraries which are provided with the JVM Software, and which are not changed after receipt by instrumentation, transformation agents, or similar means.
The use of the term “Java Virtual Machine (JVM)” and “Java Platform” is to be understood to mean an implementation of the Java Virtual Machine Specification (of any revision), and certified as Java Compatible.
The use of the terms “Application Code” and “Application Program” are to be understood to mean Java bytecode classfiles, provided by an application operator, to be loaded into a JVM to be executed by the JVM.
The use of the term “System Code” is to be understood to mean executable code, such as the bytecode classfiles of the Java API class libraries, or executable machine code of the JVM software, provided with a JVM.
The use of the term “Java EE API class libraries” is to be understood to mean any implementation of the Java Platform Enterprise Edition API Specification, or any revision thereof.
The genesis of the present invention is a desire to overcome the limitations of previous taint tracking approaches, which required instrumentation or modification to the JVM or Java API class libraries. This is preferably achieved by employing a garbage-collected tainted-value-cache that is coordinated with the operation of the JVM's garbage collector.