The present invention relates generally to protection of software programs, and more particularly to protection of just-in-time compiled code.
A just-in-time (JIT) compiler transforms a program written in a high-level language (HLL), generating native code at program run-time. The compiler emits native code into a code cache, after parsing and optimizing high level language (HLL) source code. The compiler itself is written in another programming language, which may be called the host language. JIT compilers also contain a language runtime, which is a library of functions that are written in the host language and provide or manage access to system resources. Some examples of such resources are files, networks, operating system threads and complex data structures (maps, trees). When compiling a HLL program, the JIT compiler emits native calls into the runtime whenever the program uses these resources. FIG. 5 shows a high-level structure of a JIT compiler 511 and its interactions with the generated code, which is written into an executable portion of memory 513. After emitting all or part of the native code (usually enough code to start execution of the program), the compiler branches to an entry point 515 of the HLL program. The generated code continues execution, calling into other generated functions or the language runtime. The HLL program continues until termination, making repeated calls into the runtime whenever needed.
From a security point of view, JIT compilers have one characteristic that is important in our context: predictability. As JIT compilers usually optimize code for performance, there are only a few optimal translations of HLL code to native code, and a JIT compiler emits one of these.
Our computing infrastructure depends on high performance delivered by just-in-time (JIT) compilers to a large degree. Efficiently executing JavaScript is generally a prerequisite for complex Web 2.0 applications. Similarly, Java's success rests on performance delivered by efficient dynamic code generation. From the early beginnings, JIT compilers had to focus on producing code quickly. Usually, they achieve this by optimizing the common case and forgoing time-intensive optimizations altogether. As a result, this leads to highly predictable code generation, which attackers exploit for malicious purposes. This is evidenced by the rising threats of JIT spraying and similar attacks on sandboxes in JITs. The former is particularly interesting: JIT spraying relies on JIT compilers emitting constants present in input source code directly into binary code. Due to variable length encoding, attackers can encode and subsequently divert control flow to arbitrary malicious code arranged this way.
This attack vector is innate and specific to JIT compilers. From a security perspective, the state-of-the-art in the field is to address JIT spraying by encrypting and decrypting constants. This addresses the code injection part of JIT spraying, but attackers can fall back on code-reuse techniques. Specifically, return-oriented programming for JIT compiled code is also problematic. Instead of finding gadgets in statically generated code (as they would do an a generic return-oriented programming attack), an attacker uses the JIT compiler to create new binary code containing the necessary gadgets by supplying specially crafted source code. The ubiquity of JIT compilers amplifies this security risk to such a degree that JITs become a liability.
When presented with the same HLL code many times repeatedly, a compiler will emit the same native code; attackers can use this characteristic to their advantage. This is not a problem specific to JIT compilers, but compilers in general; however, predictability of JIT compilers has not been fully explored.
JIT spraying is one recent attack that relies on predictability of JIT compilers. This attack is a form of code injection targeted at dynamically generated code. In its original form, it relies on one unintended behavior of many JIT compilers: HLL program constants reach native code unmodified, therefore becoming part of the executable code. The attacker injects short sequences (32-bit constants in the original paper), and later jumps to the injected sequence through a separate attack vector. For the attack to work, the attacker must also predict the remaining bytes inserted between the controlled sequences, and use those bytes as part of the payload; this is often possible in practice, due to the predictability of the compiler. This allows the attacker to execute arbitrary native code, even when running on a compiler that runs the generated code in a sandbox (with restricted access to memory, for example).
For many years, most arbitrary code execution attacks used the same method of gaining control of the program: code injection attacks. To prevent these, most operating systems now forbid the same page to be both writable and executable at the same time. Sidestepping this measure, a new class of attacks against applications surfaced and gained popularity: code reuse attacks. Instead of adding new executable code to an application, code reuse attacks locate reusable code sequences inside the application, then thread these sequences into a program written by the attacker. Shacham described one of the first versions of this attack, called Return-Oriented Programming (ROP); he named the code sequences gadgets. A gadget is simply a valid sequence of binary code that the attacker can execute successfully (the gadget decodes correctly and does not contain invalid instructions); a gadget can start anywhere inside the generated code (including in the middle of a proper instruction) and spans one or more of the original instructions emitted by the compiler. ROP uses only gadgets that end in a RET instruction (encoded by the C3 byte on x86); the attacker places addresses of gadgets on the stack on consecutive stack slots, so that each gadget proceeds to the next one using a return. Later work extends this idea to other indirect branch instructions.
This attack is even more potent in the presence of a JIT compiler, as an attacker that controls HLL code can emit an arbitrary amount of native code containing gadgets (by emitting as much HLL code as needed to generate all the gadgets for the attack). For example, this can be a problem for web browsers that include a JavaScript compiler, as many web pages include JavaScript code from unreliable (or hostile) sources. Another problem is that current anti-ROP defenses target ahead-of-time compilers or binary rewriters, but do not offer protection to dynamically generated code.
Recent work addresses code-reuse attacks by attacking its foundation: the software monoculture. By diversifying the binary code, attackers cannot construct reliable attack code, because the binary code layout differs for each end-user. Consequently, diversity increases the costs for attackers, ultimately rendering them too costly. Unfortunately, existing approaches to artificial software diversity do not protect dynamically emitted code from a just-in-time compiler.