Return-oriented programming (ROP) is a computer security exploit technique in which an attacker uses software control of a stack to execute an attacker-chosen sequence of machine instructions. These instructions typically end with a programmer-intended or unintended return (RET) instruction within existing programming code. The intended or unintended RET instruction transfers execution to the attacker-chosen return address on the stack and allows the attacker to retain execution control through the program code, and direct execution to the next set of attacker-chosen instructions to achieve the attacker's intent. The attacker-chosen instruction sequences are referred to as gadgets. A gadget can also end with an indirect jump or indirect call instruction.
By chaining together a set of these gadgets such that the indirect branch (i.e. return, jump or call) from one gadget lands into the next gadget and so on, the malware writer is able to execute a complex algorithm without injecting any code into the program, which can be referred to as code-reuse or ROP (return oriented programming). This is commonly referred to as gadget chaining.
The ROP technique involves injecting a payload into the memory of a program by using vulnerabilities like stack buffer overflows. The payload usually contains a set of pre-computed chained pointers to gadgets and parameters. The exploit then needs to redirect control by overwriting the data used in an indirect branch to point to the first gadget instead of the intended destination. This also is usually accomplished by memory corruption. After the initial control flow re-direct, the ROP-chain continues.
A method for defending against such ROP attacks was described in a 2015 publication by the National Security Agency titled Hardware Control Flow Integrity (CFI) for an IT Ecosystem. NSA's 2015 paper describes a control flow integrity process which uses a branch flag in connection with a new “landing point” (LP) instruction to protect forward edges of a control flow graph and a shadow stack to protect return edges of a control flow graph. This control flow integrity process provides that after an indirect branch instruction, if a landing point instruction does not immediately occur, a fault notice is provided.
A direct branch instruction is a branch instruction that provides the destination of the instruction as data stored in read-only memory. This read-only memory is part of the direct branch instruction, therefore it cannot be maliciously altered. An indirect branch instruction uses a computed destination such as jump indirect, call indirect or return. These computed destinations can be maliciously altered. The instruction set architecture (ISA) of the control flow integrity process described in NSA's paper defines a landing point (LP) opcode and the associated LP operation for use in connection with indirect branch instructions. The LP operation is used in combination with a branch flag and a shadow stack to provide notice of a fault in the control flow.
The branch flag is logically equivalent to a bit in a status register. The branch flag has two modes: a branch flag “on” mode (i.e., BF=ON) and a branch flag “off” mode (i.e., BF=OFF). Execution of an indirect branch instruction sets the mode of the branch flag to “ON”. In the branch flag ON mode, the only instruction which may be executed by the processor, without causing a fault notice, is the landing point instruction (LP). If an attempt to execute any other instruction occurs while BF=ON, then the processor issues a fault notice. Execution of the LP instruction sets the branch flag OFF. In the branch flag OFF mode, all types of instructions may be executed by the processor, including landing point instructions.
A shadow stack is also used to ensure control flow integrity. The shadow stack is a stack separate from the data stack and includes a shadow stack pointer which points to the top position of the shadow stack. The shadow stack is a protected stack that protects the return address, i.e., the address of the instruction to be executed upon return of control to the call site. The shadow stack is protected because only a limited number of functions provide access to the shadow stack. For example, access to the shadow stack may be limited to the functions “CALL” and “RET”. Specifically, the function CALL provides the ability to write the return address on the shadow stack and the function RET provides the ability to read the data on the shadow stack at the position of the shadow stack pointer.
Upon execution of a CALL instruction, the return address is stored on both the data stack and the shadow stack. The RET operation is a shadow stack aware operation that pops the data stack and the shadow stack. When a RET instruction occurs, the top of the data stack and the shadow stack are compared. If the address on the shadow stack matches the address on the data stack, execution of instruction continues. If the addresses do not match, a fault notification is provided. If the values match, the processor continues execution of instructions at the return address.
The net effect of re-instrumenting existing code with landing points and shadow stack is to constrain redirectable gadgets to only blocks of code that start with a landing point and end with the indirect branch. To create a ROP chain necessary to gain control in this newly instrumented environment, the attacker would be limited to gadgets that are the result of intended compiler output instead of the much larger population of unintended gadgets that can be formed from bytes within legitimate compiler output. Furthermore, using a return branch, which is the predominant chaining technique today would no longer be feasible due to the shadow stack protection of the return address.
FIGS. 1-3 illustrate operation of the LP instruction, in connection with a processor 5 including a data stack 10, a shadow stack 12, a branch flag 13 and an ISA 14 defining an LP opcode and an LP operation associated with the LP opcode. Specifically, FIG. 1 illustrates use of the LP instruction and a branch flag in connection with control flow as intended by the programmer.
The data stack 10 stores data relating to instructions executed by the processor 5. A number of positions, 30-46 are provided on the data stack 5 for storing data relating to the instructions to be executed by the processor 5. Positions 30-46 of the data stack 10 are illustrated in FIG. 1. Each position represents, for example, four bytes of memory. The processor 5 further includes a data stack pointer 48 associated with the data stack 10. The data stack pointer 48 is typically a register that identifies the current top of the stack and is illustrated in FIG. 1 as an arrow.
The shadow stack 12 is separate from the data stack 10 and is used exclusively for control transfer operations. A number of positions are provided on the shadow stack 12 for storing data relating to the control transfer instructions. A limited number of control transfer instructions are capable of writing to or reading from the shadow stack and are sometimes identified as “shadow stack aware instructions.” Positions 50-66 of the shadow stack 12 are illustrated in FIG. 1. The processor 5 further includes a shadow stack pointer 68 associated with the shadow stack 12. The shadow stack pointer 68 is a register that identifies the current top of the shadow stack and is illustrated in FIG. 1 as an arrow. As instructions are executed, the shadow stack pointer 68 moves relative to the positions 50-66 of the stack. As instructions are executed which result in information being written onto or “pushed” onto the shadow stack, the shadow stack pointer 68 is moved down. As instructions are executed which result in data being read from or “popped” off of the shadow stack, the shadow stack pointer 68 is moved up. The shadow stack 12 is a protected stack. Protection is provided to the shadow stack 12 due to the limited types of ISA instructions which write to or read from the shadow stack 12. For example, instructions, providing access to the shadow stack 12 may be limited to CALL and RET. Specifically, the CALL instruction provides write access to the shadow stack 12 and the RET instruction provides read access to the shadow stack 12.
The processor 5 further includes an ISA 14 which defines the instructions the processor 5 will execute. The ISA 14 defines the landing point instruction LP.
A series of instructions are provided to the processor 5 including an instruction to call (CALL) function “foo”. Upon execution of the CALL instruction, the address of the return, i.e. the address of the instruction to execute when foo returns to the call site (“foo ret address”), is pushed to position 30 the data stack 10 and the data stack pointer is aligned with position 32 of the data stack. “Foo ret address” is also pushed to position 50 of the shadow stack 12 and the shadow stack pointer is aligned with position 52 of the shadow stack 12. A set of intended instructions 18 provided by function foo is provided to the processor 5. As illustrated, the instruction LP is executed next, setting the BF mode to OFF. In the BF=OFF mode, A function prologue is executed which sets up a call frame of size F for intended function foo. For example, the “push rax” instruction results in the contents of register rax being pushed to the data stack 10 and alignment of the data stack pointer 48 with position 34; the “push rbx” instruction results in the contents of register rbx being pushed to the data stack 10 and alignment of the data stack pointer 48 with position 36; the “push rcx” instruction results in the contents of register rcx being pushed to the data stack 10 and alignment the data stack pointer with position 38; the data stack pointer 48 is then moved down 32 bytes (represented by l, m, n, o) and aligned with position 46. Execution of additional instructions, not shown, continues.
An indirect branch instruction is executed within the function foo directing control to pass to the intended address *. Execution of the indirect branch instruction sets the mode of the branch flag to ON (BF=ON) and control of instructions is passed to intended address *. At intended address *, an LP instruction is provided. Execution of the LP instruction, sets the mode of the branch flag to OFF (BF=OFF). Because the branch flag is in the “OFF” mode, the processor continues to execute all types of instructions.
Execution of additional instructions, not shown, continues. Eventually, a function epilogue for function foo() is executed which tears down the call frame of size F. Specifically, the data stack pointer 48 is moved up 32 bytes and the data stack pointer 48 is aligned with position 38; the contents of register rcx are popped from the stack and the data stack pointer 48 is aligned with position 36; the contents of register rbx are popped from the stack and the data stack pointer 48 is aligned with position 34; and the contents of register rax are popped from the stack, leaving the data stack pointer aligned with position 32 of the data stack 12.
Execution of the RET instruction, pops the data from the top of the stack 10 and the data stack pointer 48 is aligned with position 30 of the data stack 10. Execution of the RET instruction also pops the data from the top of the shadow stack 12 and the shadow stack pointer 68 is aligned with position 50 of the shadow stack 12. The data popped from the data stack 10 is compared to the data popped from the shadow stack 12. In the case illustrated in FIG. 1, the data popped from the data stack 10 is “foo ret address” and the address popped from the shadow stack 12 is “foo ret addresses” The data matches, therefore control returns to the too RET address.
FIG. 2 illustrates the same processor 5 of FIG. 1 along with the same intended set of instruction 18. FIG. 2 illustrates use of the branch flag and landing point instruction, to successfully identify and interrupt control flow not intended by the programmer. Instructions noted in FIG. 2 but stricken, indicate that the instructions are not executed by the processor. Specifically, function “foo” is called, and the address of the instruction to execute when foo returns, i.e. “foo ret address”, is pushed to the data stack 10 and to the shadow stack 12. Execution of the LP instruction results in the branch flag 13 being turned off. A function prologue is executed which sets up a call frame of size F for function foo. Upon set up of the call frame, that data stack pointer 48 is aligned with position 46 of the data stack 10. Execution of additional instructions, not shown, continues.
An indirect branch instruction is provided within function foo, directing control to pass to intended address *. Execution of the indirect branch instruction sets the mode of the branch flag 13 to ON (i.e. BF=ON) and execution of instruction at address * is intended to occur. An attacker, however, manipulates the address * and forces execution to unintended address *! instead. At unintended address *!, a misdirected program flow 70 is provided and function “bar” is called. An LP instruction is not provided at unintended address *!, therefore the branch flag 13 remains in the “ON” mode. In the BF=ON mode, when the processor attempts execution of the instruction “push r11”, a fault notice is provided.
FIG. 2 illustrates how landing points impose a coarse constraint on the forward edges of a control flow graph, i.e., an indirect branch can no longer branch to any instruction, it must branch to a landing point instruction. Landing points move the attacker from a world where there's no specification (i.e., branch to any address) to one with some specification (i.e., branching is limited to landing points). By limiting the instructions which will execute when a computed branch instruction is encountered, execution of unintended instruction flows is greatly reduced. A fully instrumented program will have landing points at the destination of every indirect branch.
Although the control flow integrity method described by NSA provides some control flow integrity, in certain scenarios the control flow may be compromised despite the use of the branch flag 13, the ISA defining the landing point opcode and operation 14, and the shadow stack 12. FIG. 3 illustrates a scenario in which despite the use of the branch flag 13, the ISA defining the landing point opcode and operation 14, and the shadow stack 12, control flow integrity (i.e. control flow intended by the programmer) was not maintained.
As illustrated in FIG. 3, function “foo” is called, and the address of the instruction to execute when foo returns, i.e. “foo ret address”, is pushed to position 30 the data stack 10 leaving the data stack pointer 48 at position 32. “Foo ret address” is also pushed to position 50 of the shadow stack 12 leaving the shadow stack pointer 68 aligned with position 52. Execution of the LP instruction results in the branch flag 13 being turned off. A function prologue is executed which sets up a call frame of size F for function foo. Upon set up of the call frame, the data stack pointer 48 is aligned with position 46 of the data stack 10. Execution of additional instructions, not shown, continues.
An indirect branch instruction is executed within function foo. The indirect branch instruction sets the mode of the branch flag to ON and execution of instruction at an address * is intended to occur, however the attacker manipulates the intended address * and forces execution of instructions to unintended address *!. A misdirected program instruction set 80 is provided at unintended address *!. Further, at unintended address *! an LP instruction is provided.
At unintended address *!, execution of the LP instruction turns OFF the branch flag 13. Because the branch flag 13 is in the “OFF” mode, the processor executes all types of instructions. A function epilogue for bar is executed which tears down the call frame of size F. Specifically, the stack pointer is moved up 48 bytes leaving the data stack pointer aligned with position 32 of the data stack 10. Next, although the instruction indicates that the contents of register r11 are popped from the data stack 10, the data stack pointer 48 is aligned with position 32 of the data stack 10 and the contents of the incoming rax is popped from the data stack 10 leaving the data stack pointer 48 aligned with position 30. As a result of maliciously marrying the control flow between the prologue snippet of foo( ) with the epilogue snippet of bar( ), the intended functionality of the program was changed since foo( ) never intended to move the contents of register rax to r11. The value in rax when the processor returns to the call site of foo( ) will also be not what was intended by the program.
Execution of the RET instruction provides a comparison of the addresses at the data stack pointer 48 and the shadow stack pointer 68. Here, the data stack pointer 48 is aligned with position 30 and the address at position 30 is “foo ret address.” The shadow stack pointer 68 is aligned with position 50 and the address on the shadow stack 12 is “foo ret Address.” The address at the top of the data stack 10 matches the address at the top of the shadow stack 12, therefore a fault notice is not issued and control is passed to the return address at the function call site. Execution of the code continues despite the fact that the control flow was misdirected to the unintended program flow 80 and detection of the malicious activity is avoided.
Although use of landing points provides control flow integrity improvement at a minimal cost, as illustrated in FIG. 3, the LP instructions do not require branching to the specific programmer intended destination. When a malicious actor takes advantage of these engineer-able but unintended flows, a spectrum of unintended negative outcomes, can occur. Thus, a need exists for even greater control flow integrity to preventing the impact of these unintended flows.