Whenever software is sold, there is the possibility that a malicious user will tamper with the software to extract secret keys or algorithms. The intellectual property contained in and protected by software is regularly attacked, often with malicious intent. Software tamper resistance can be addressed either through the use of specialized hardware or by incorporating features in the software itself which make it more difficult to reverse engineer. Tradeoffs must be made with each approach. Hardware-based techniques such as dongles, smartcards, and other types of secure processors have the potential to provide a higher level of protection, but they are generally more expensive to produce and cumbersome for the end user. Because specialized hardware is not widely used for general computing purposes, hardware-based approaches often prove infeasible for that market. Also, such an approach may be infeasible when software is distributed via the Internet.
Software-based tamperproofing approaches, while easier to implement, are not generally focused on an absolute solution, i.e., a protection mechanism which is undefeatable. Because the attacker has full control over the execution environment, it is believed that given enough time, effort, and/or resources, a sufficiently determined attacker can completely break any piece of software. Instead, software-based protection techniques focus on increasing the time, effort, and/or resources required by an attacker to break the software. Such an approach may eliminate some classes of attackers but does not prevent a sufficiently determined and skilled attacker. Thus, tamper resistance technologies take the approach of limiting the number of attackers.
One common attack involves the modification of software with the goal of circumventing software protection technologies. This attack sometimes occurs where a software developer has distributed a trial version of their program. The trial software can generally only be used for a specified number of days or executions. Normally, the time passed or usage recording is automatically performed within the software. To circumvent the check, the attacker has to modify the software. Once the check is bypassed, the attacker has obtained unlimited use of the software either at a significantly discounted price or, more commonly, for free. To further compound the attack, the modified software could be redistributed by the attacker for free or for a profit. The music and movie industries also rely on software-based digital rights management technologies to protect their copyrighted material. Such content protection can often be bypassed through malicious software tampering and reverse engineering. To address these issues, considerable research has focused on the development of tamper resistant technologies.
Most intrusion detection mechanisms are used after the damage is done and, thus, are reactive. The term “proactive security” refers to the detection of what goes wrong during a process, such as an execution of software before the final damage is done. Most prior art systems don't provide for a proactive security mechanism to combat reverse-engineering of software; they don't identify the evidence hackers (e.g. malicious users) leave behind during a reverse-engineering attempt. The term “user” is used herein to refer to an operator of an untrusted computer (e.g., set-top box, PC, PDA, video game console, etc.), who could try to tamper with the software code (such as software code related to web browser programs, sound card drivers, game console programs, Java® applets, or macros embedded in multimedia content such as images, music, or video files). There is thus a need to proactively detect (and thereby prevent real damage from occurring) an on-going reverse-engineering process before hackers succeed in the tampering and before they gain access to important information such as secret keys.
Another relevant concept is “forward security” which is a formal property that has been identified and appears in security literature. Forward security includes methods of preventing corruption of past code or logs after tampering has occurred. Future actions may be untrusted, but preexisting trusted items remain uncompromised.
Early on, many techniques were proposed with little or no attention paid to the evaluation of the technique with respect to an attacker or established threat model. A shift in focus began with a group led by Collberg who evaluated software watermarking algorithms based on a threat model and a defined set of properties [4, 15, 16, 18]. More recently, researchers have begun to examine the issue of software protection from the attacker's point of view by proposing attacks against published tamper resistance techniques [14, 21, 22] and the development of a disassembler which is more resistant to code obfuscation [11].
A variety of different software-based defenses have been proposed such as software watermarking [5, 17, 19, 23], code obfuscation [6, 12, 20, 24], and tamper resistance. One of the first tamper resistance algorithms published was by Aucsmith [1]. This algorithm is based on Integrity Verification Kernels (IVK) which are units of code responsible for performing critical program functions. Each IVK is split into smaller blocks of code which are individually encrypted. To detect tampering, at each step, the sum of the hashes of all previously executed blocks is checked to verify that the blocks were executed correctly and in proper order. Additional tamper resistance techniques have been proposed by Chang and Atallah [2], Horne, et al. [8], and Chen, et al. [3].
One way of watching for abnormalities that might indicate hacking is by maintaining an “audit log”. In this scenario, one needs to identify the information that needs to be put into the log for detection and the verification process that should follow. The term “log” thus refers here to an audit trail using a set of entries describing what happens with regard to execution of software code. Making the information in the log satisfy certain properties can at times not only make the scheme more efficient (in terms of reducing log size and creating a more efficient verification), but can also guarantee the verification process and detect the target anomaly.
Two software-based tamper resistance techniques, proposed by Jin and Lotspiech [9] and Jin, et al. [10], based on a common key evolution mechanism are discussed below. The techniques offer many advantages over previous software-based algorithms, but they still contain weaknesses which can be exploited by an attacker.
Event Log-Based Tamper Resistance
As an initial tamper detection technique, Jin and Lotspiech developed a method which provides software protection in an online environment through the use of an event log [9]. The event log is similar in concept to the traditional audit log. In short, the Event-Log Based invention proactively detects software tampering by using key evolution to produce a dynamically evolving audit log. The key values are evolved from an initial value based upon a one-way function depending on both the previous log entry and the previous key. As the program executes, specific execution events are recorded in the log. The event log is then transmitted to a clearing house where it is examined. The overall goal of the technique is to detect the ongoing, minor program alterations before the attacker has succeeded in disabling the tamper detection mechanism.
Because the log potentially contains evidence of tampering, it also becomes a target of attack. Therefore, in this scenario, the audit trail itself needs to be protected such that there is no way for a hacker to undetectably delete any old entries in the trail. There is a possibility that the hacker will eventually completely understand the logging mechanism, and from that point on the new entries in the trail cannot be trusted.
Specifically, to enable the construction of an event log, the original program is modified, incorporating integrity check code. In one embodiment, a key ki is advanced each time an integrity check is performed using a one-way function f1, so that ki+1=f1(ki). The new key ki+1 and the integrity check value vi+1 are then combined using another function f2, so that entry ei+1=f2(ki+1, vi+1). The resulting value ei+1 is then recorded in the log along with vi+1. See prior art FIG. 1. Alternatively, the key evolution process could incorporate the integrity check value. In this case, the one-way function f uses both the current key ki and the current integrity check value vi to generate a new key ki+1=f(ki, vi). See prior art FIG. 2.
As the program executes, the inserted integrity check code is triggered and the result from the integrity check is recorded in the log. The log entries may include a checksum computation based upon a section of software code, for example. A change in the checksum value indicates the program has been modified. Alternatively, to detect the presence of a debugger, the elapsed time to execute a sequence of instructions can be measured. If the measured time exceeds a present threshold, it is highly likely the program is being run using a debugger.
Upon connection to a network, the final key kn and the log are transmitted back to the clearinghouse. The clearinghouse performs the same key evolution to verify the integrity check values and the final key kn. The tampering verification/detection process can then be a simple comparison between the returned log and the correct information that the clearinghouse has computed using the same evolving key. If the untrusted user substitutes an old valid log, the keys will not be correct. If the user submits a truncated log, then the next time the log is transmitted, the keys will not be correct.
The Event-Log Based invention thus applies the “forward security” property to integrity checks during software execution to proactively detect the software tampering process. Any truncation of the log, deletion of the log, or substitution of an old valid log is easily detected when the user connects to get new content. It is assumed that before the hacker completely understands a specific software program, he or she will have triggered many “integrity check” failures that will have been logged. A hacker is therefore probably unable to reverse-engineer the software without being detected.
The Event-Log Based invention can respond to detected tampering in many ways, including revoking a software user's device keys, increasing the number and variety of types of integrity checks in software code or content sent to a user, increasing the frequency of periodic transmission of the audit log and final key, and advising a system administrator regarding said detected tampering. The administrator can choose to merely pay more attention to the same user the next time or choose to give a warning to the user. Alternately, when enough tampering evidence is accumulated, the user can be disconnected from the network and disallowed from receiving new content distribution in the future.
As with any detection mechanism based on an audit log, a main concern is the size of the log. In the two examples presented above the entire log must be transmitted to the clearing house. It is not hard to imagine applications in which the log will become very large between connections. The main reason that the entire log must be transmitted to the clearing house is that even though the correct values for the integrity checks are known, the particular order that they will be executed is not known. To address this issue, Jin and Lotspiech proposed an alternate means of incorporating forward security. Instead of embedding the integrity checks throughout the program, their locations are restricted to points that are encountered along all executions paths through the program. In this scheme, the clearing house knows exactly which integrity checks are invoked and in which order. With this knowledge, the clearing house can evolve the key using the initial key, thus, only the final key kn needs to be transmitted. See prior art FIG. 3. The log entries may be wrapped around to save space, or may be designed to consistently produce only a single value upon proper software execution. The use of secure tamper-resistant hardware for key storage can further strengthen the scheme.
The Event Log-Based technique has three major identified limitations. The first limitation is the need for a periodic online connection. Such an approach may be suitable for a business scenario in which the customer wishes to maintain an ongoing relationship with the service provider. For example, consider a customer who buys a music player and also wishes to buy music from an associated online music store. When the customer makes the connection to the music store, the log can be transmitted piggyback to the clearing house. Tampering can be detected at this time, and access to the music store can be prevented. However, an attacker who does not care about using the associated music store can simply prevent the connection, and tampering will go undetected. This makes it possible for an attacker to disable the integrity checks and distribute an unprotected version of the software.
The second limitation relates to an assumption made when designing the protection mechanism. This assumption is that the attacker will trigger at least one integrity check value which is detected by the clearing house before the software is completely understood. In reality, it is difficult to achieve forward security on a client machine that is under the complete control of an attacker. Such security may be sufficient for casual attackers, but more sophisticated attackers are equipped with extensive computing resources and skills. They can perform attacks using simulated and instrumented environments and can completely or partially replicate the state of program execution to another machine.
Finally, the placement of the integrity check values introduces the third limitation. The authors acknowledged that unrestricted placement has the potential to produce prohibitively large log files. However, restricting the placement to points which are guaranteed to be executed on every execution of the program is also prohibitive for two reasons. First, the set of such points may be rather small, and second, for general programs identification of a deterministic path through a program is NP hard.
Branch-Based Tamper Resistance
The '934 Branch-Based invention, incorporated by reference above and described in [10], details a scheme wherein the software itself detects the tampering and causes the program to fail, thus protecting itself from malicious attacks without requiring a clearinghouse and related network connection. In this offline tamper resistance scheme, the same key evolving mechanism is used, but instead of merely recording the values, they are used to regulate proper program execution. The Branch-Based invention incorporates tamper detection capabilities into a protected copy of a software application by disassembling a statically linked binary of the application, modifying some of the instructions in the application, and then rewriting all of the modified and unmodified instructions to a new executable file, a protected copy. At least some of the branch instructions in the application are converted to call or control transfer instructions. The '934 invention determines a target displacement (stored in a displacement table) for each of the branch instructions replaced by a call instruction.
A call instruction is used to call a branch function, which could be an integrity check branch function or a fingerprint branch function depending on the situation. Within the branch function, tampering is detected through the computation of the return address; if the program has been altered the integrity check will yield an incorrect value, which will lead to the wrong slot in the displacement table being accessed. If the branch function is an integrity check branch function, alteration of the call location will yield an incorrect slot in the displacement table. If the branch function is a fingerprint branch function and if the current key has been altered, an incorrect slot in the displacement table will similarly be accessed. In each case, the wrong displacement will be added to the return address. Upon function return, an incorrect instruction executes, eventually leading to program failure.
The Branch-Based system also allows watermarking, comprising both an author mark and a fingerprint mark in the protected copy. A watermarking module evolves and stores a fingerprint key. The order of the target displacements within the displacement table is customized to a particular generation of the fingerprint key; consequently, the application only executes with the specific user key. The '934 system thus emulates the performance of a dongle without the drawback of dongle distribution. Furthermore, a fingerprint key does not have to be stored in the application; rather, a fingerprint key can be distributed with the program and be required when the program is executed. For example, during an online registration process, the system ties a fingerprint mark in the application with the purchaser by embedding some software in the program that enables the fingerprint capability functionality of the application.
The watermarking module comprises a conversion of branch instructions to a call instruction that calls a specifically constructed fingerprint branch function. This call function not only computes the fingerprint of the program but it also regulates execution of the program. Consequently, if an attacker attempts to remove the watermark embedded in an application by the '934 system, the resulting software is non-functional. The '934 system utilizes an embedding technique for the watermark in which a section of code is added to the application. This code calculates the fingerprint mark as the program executes and contributes to the proper execution of the program.
The Branch-Based algorithm is built on the use of a branch function similar to the one proposed by Linn and Debray to disrupt static disassembly of native executables [13]. The original branch function was designed simply to transfer execution control to the intended target instruction by replacing unconditional branch instructions with call instructions. Prior art FIG. 4 illustrates the general idea of the branch function. Such a transformation eliminates some of the obvious control flow making static analysis more difficult and provides minimal tamper resistance.
In order to provide tamper resistance for the entire application, Jin et al. enhanced the branch function to incorporate an integrity check and key evolution into the target instruction computation. The so called integrity check branch function (ICBF) performs the following tasks.
1. An integrity check producing the value vi.
2. Computation of the new key ki+1 using vi and the current key ki, ki+1=f(ki, vi).
3. Identification of the displacement to the target via di+1=T[h(ki+1)] where T is a table containing displacement values and h is a perfect hash function.
4. Computation of the intended target (the return location) by adding the displacement di+1 to the return address on the stack.
Tampering is dynamically detected as the program executes through the computation of ki. When the program is altered, at least one integrity check will produce an unexpected value. This will lead to incorrect key evolution and calculation of the wrong target instruction. Execution of an improper instruction will ultimately lead to program failure, which is a desired outcome for tamper resistant software.
Through the enhancements and the use of multiple ICBFs, tamper resistance can be established for the entire program. For example, the check system can be configured such that one integrity check verifies that another has not been modified or removed.
There are three identified limitations to the Branch-Based technique. The first also occurs in the Event Log-Based algorithm and is associated with a design assumption. Jin, et al. assume that an attacker's analysis tools and program modifications will be detected by an integrity check. Furthermore, such detection is assumed to cause the program to terminate prior to the attacker gaining knowledge that will aid in the circumvention of the ICBFs. Once again, on a client machine that is under complete control of the attacker, achieving such a level of forward security is extremely difficult.
The second limitation relates to the need of an initial key k0. Because the key evolution is linked to proper program behavior, the same initial key is required each time the program starts. To preserve the property of forward security provided by the one-way function, special care must be taken to prevent the discovery of the initial key. If the initial key goes unprotected, the attacker will be able to eventually discover it. Once discovered, it can be used to generate future keys and therefore unravel the protection mechanism. Jin, et al. suggest the use of secure computing devices such as the Trusted Platform Module (TPM) as a solution to this issue. Despite this suggestion, repeated use of the same initial key is a major weakness and provides a point of attack not present in the Event Log-Based scheme.
The third limitation is analogous to the integrity check placement limitation present in the Event Log-Based technique. In the Event Log-Based algorithm the size of the log is minimized by restricting the placement of the integrity checks to points guaranteed to be encountered on every execution of the program. A similar restriction is also required in the Branch-Based algorithm. Because the key evolution is linked to proper program behavior, the key evolution must be regular. This limits the set of branch instructions eligible for conversion to those which reside on a deterministic path. However, the restriction is not as severe as in the Event Log-Based technique. Any branch which resides on a deterministic path through a chosen function is a candidate for replacement, as opposed to a deterministic path through the entire program.
The present invention builds upon the Event-Log Based invention and the Branch-Based invention to provide software tamper-resistance without a clearinghouse, and without constraints on key evolution points. The Branch-Based system is therefore now described in detail.
FIG. 5 portrays an exemplary overall environment in which the '934 system, method, and service for detecting improper manipulation of an application may be used. The system comprises a software programming code or a computer program product that is typically embedded within, or installed on a host server. Alternatively, the system can be saved on a suitable storage medium such as a diskette, a CD, a hard drive, or similar devices. A client owns an application. While described in terms of other application, the system can be applied to any executable programming code. The client wishes to acquire piracy and tamper protection for the application through a transformation process executed by the system. The client contracts to utilize the system on the application by means of a purchase or a service through a network. Purchase allows the client to purchase the system for operation on a client's server. In this case, the server is owned or otherwise operated by the client. Alternatively, the client can acquire protection for the application through subscribing to a service. In this case, the server is owned or otherwise operated by the owner of the system or an agent of the owner of the system.
The '934 system analyzes and transforms the software application, producing a protected copy. The server transfers the protected copy to a client distribution via a network for distribution of the protected copy to users. The server further transfers the protected copy to a distribution center. The distribution center comprises, for example, a store in which a user may purchase a pre-packaged version of the protected copy. The client distribution comprises, for example, a web site operated by the client at which a user may purchase and download the protected copy.
FIG. 6 illustrates a high-level hierarchy of the system, including an integrity check processor and a watermarking processor. The protected copy comprises branch function(s), an integrity check module, a watermarking module, and a displacement table T.
The integrity check module comprises integrity check branch function(s). The integrity check processor transforms one or more branch instructions in application to calls to integrity check branch functions in the protected copy. Each of the integrity check branch functions access a cell or “slot” in table to locate a target instruction of the branch instruction. Consequently, control flow is obfuscated by routing control flow through the integrity check module.
The watermarking module comprises fingerprint branch function(s). The watermarking processor embeds a watermark by disassembling a statically linked binary of application, modifying the instructions of application to generate the fingerprint branch functions, and then rewriting the instructions to a new executable file, the protected copy. The watermarking module further comprises a fingerprint mark, one or more fingerprint keys, and an authorship mark.
The branch function is a special function used as part of an obfuscation technique to disrupt static disassembly. This obfuscation technique converts unconditional branch instructions to a call to the branch function that is inserted in the protected copy. The purpose of the branch function is to transfer the control of execution to the instruction that was the target of the unconditional branch instruction (further referenced herein as the target instruction). The branch function can be designed to handle any number of unconditional or conditional branches.
In general, the branch function is responsible for choosing the correct target instruction based on the call location. There are a variety of ways to accomplish such a task. FIG. 7 illustrates a method in generating the branch functions and the table. The system executes application. During execution of the application, the system selects one or more branch instructions in application and constructs a mapping between locations of the branch instructions (jn) and the target instructions (tn) of the selected branch instructions:Theta={j1→t1,j2→t2, . . . , jn→tn}
The system uses a hash function to assign a unique identifier to locations of each of the selected branch instructions: h={j1, j2, . . . , jn}→{1, 2, . . . , n}
The system constructs a displacement table T in a data section of the protected copy that lists displacements for each (ji, ti) pair; the displacement is the difference in location in the code of application from a selected branch instruction to the target instruction of the selected branch instruction. The displacements are stored in the displacement table T such that T[h(ji)]=ti−ji. The system writes the branch functions with the executable code of application to the protected copy.
The use of the branch function provides tamper detection. Any transformation applied to a branch function that alters the displacement between the branch instruction and the target instruction causes the branch function to return to an incorrect target instruction. Through the use of the branch function, obvious control flow is removed from the protected copy.
FIG. 8 (FIGS. 8A, 8B) illustrates a method of operation of the protected copy. A user executes the protected copy. The protected copy selects an instruction to execute. If the selected instruction is not a call to a branch function, the protected copy executes the instruction. While the method is illustrated using a call instruction, any branch instruction may be used that transfers execution control to one of the branch functions. If no additional instructions remain in the protected copy for execution, the protected copy exits execution. If additional instructions remain, the protected copy selects a next instruction and returns.
If the selected instruction is a call to a branch function, the protected copy calls the branch function. An integrity check is performed to generate a value vi. A value xi is produced using vi and either the previous key, ki−1, or the branch location bi. The called branch function applies a hash function to the value generated xi to compute h(xi). The term h(xi) corresponds to a slot in the displacement table T. The branch function accesses the slot h(xi) in the displacement table T and obtains the displacement to a target instruction. The branch function adds the displacement to the return address of the call instruction (corresponding to the original branch instruction replaced by the call instruction).
The protected copy goes to the target instruction and executes the instruction. If no additional instructions remain in the protected copy for execution, the protected copy exits execution. If additional instructions remain, the protected copy selects a next instruction and returns.
FIG. 9 illustrates conversion of a branch instruction to the branch function using, as an example, an exemplary application in the x86 instruction set. In this example, instructions such as, for example, jmp, call, and jcc instructions, are converted to call instructions. Each of the call instructions calls a single branch function. The branch function directs control flow to target locations. FIG. 9 demonstrates how a control flow is interpreted in an application after transformation by the system. For example, an instruction in the exemplary application before transformation is:
j1:jump t1
The system replaces instruction 525 with call instruction 530:
j1:call b
where the instruction “call b” references a call to the branch function. The branch function returns execution control to the target instruction at target location t1.
To provide further tamper detection for the protected copy, the integrity check processor transforms one or more branch instructions into branch functions that incorporate an integrity check, referenced herein as integrity check branch functions. One or more integrity check branch functions are incorporated in the protected copy to develop a self-monitoring check system for the protected copy.
The integrity check processor inserts the integrity check module in the protected copy. The integrity check module incorporates an integrity check into the computation of the location of a target instruction. FIG. 10 illustrates a method of operation of the integrity check module. The integrity check module performs an integrity check that produces a value vi. The integrity check module computes a value ai using vi and a branch instruction location bi, ai=g(bi,vi). The integrity check module identifies a displacement to a target instruction from a selected branch instruction via di=T[h(ai)] where the displacement table T is stored in the data section of the protected copy and h is a hash function. The integrity check module computes a return location of the target instruction by adding the displacement di to the return address of the selected call instruction.
Through the incorporation of an integrity check in the integrity check branch function, the system provides tamper detection for the entire program of the protected copy as opposed to only those instructions between a branch instruction and a target instruction. The integrity check module is an inserted section of code used to verify the integrity of the protected copy. Any of a variety of techniques can be used by system to perform an integrity check such as, for example, a checksum over a block of code. Through the use of integrity checks, the protected copy can identify, for example, whether it has been subjected to semantics-preserving transformations or whether a debugger is present during execution.
FIG. 11 (FIGS. 11A, 11B) illustrates a method in which a tamper detection mechanism is incorporated into the protected copy by injecting an integrity check module into the protected copy and converting selected branch instructions to calls to the integrity check branch functions. The system executes application. The integrity check processor selects a set of branch instructions, {b1, . . . , bn}, for conversion into call instructions.
The number and location of the selected branch instructions is based on a trade off between robustness and performance overhead. The robustness against reverse engineering is partially based on the number of converted branch instructions. However, as the number of converted branch instructions increases so does the overhead of repeated execution of the integrity check branch functions. A criterion used for selecting branch instructions in performance sensitive application is to avoid program hotspots (sections of code that are repeatedly executed). Otherwise, branch instructions can be selected in a random fashion. The system can select any, some, or all of the branch instructions for transformation to call functions that call the integrity check branch functions.
The integrity check processor constructs a mapping between the selected branch instructions and the integrity check branch functions:Theta={b1, . . . , bn}→{IntegrityCheck1, . . . , IntegrityCheckn}where the term IntegrityCheck refers to an integrity check branch function. The integrity check processor uses this mapping to replace the selected branch instructions by calls to the appropriate integrity check branch functions.
The integrity check processor constructs a displacement table T. For each of the selected branch instructions replaced in application, a mapping is maintained between the calculated value ai and the displacement di between the selected branch instruction and its related target instruction displacement. This mapping is described as:Phi={a1→d1, . . . , an→dn}
The integrity check processor uses Phi to construct the displacement table T. To fill the displacement table T the integrity check processor constructs a hash function, h, such that each value ai maps to a unique slot in the displacement table T:H:{a1, . . . , an}→{1, . . . , m},n≦m where {1, . . . , m} are the slots in the displacement table T to which the values {a1, . . . , an} are mapped. By using a perfect hash function, the size of the displacement table T can be minimized. Based on h, the displacement table T is added to the data section of the protected copy:T[h(ai)]=di The system writes the integrity check module, the integrity check branch functions, and remaining executable code from application to the protected copy.
FIG. 12 illustrates an exemplary code transformation from original code to protected code incorporating the integrity check module. Branch instruction jmp L2 is transformed to call_branch_function_1 820. Branch instruction jge L3 is transformed to call_branch_function_2 830. Branch instruction call printf is transformed to call_branch_function_3 840. Branch instruction jmp L1 is transformed to call_branch_function_1 850.
FIG. 13 illustrates an exemplary control flow graph representing the original code and a protected control flow graph representing protected code. The original control flow graph and the protected control flow graph demonstrate how control flow is interpreted before and after the transformation of application to the protected copy by the integrity check processor.
In one embodiment, the system further enhances the tamper-detection strength of protected copy through the use of indirection. Added levels of indirection increase the amount of analysis required by an attacker to understand the protected copy. Further indirection can be incorporated by rerouting all calls to the integrity check branch functions through a single super branch function that transfers execution to the proper branch function.
A goal of any tamper detection technique is to prevent an adversary from altering or reverse engineering the program. One of the most common forms of automated attack is code obfuscation. Through the use of the integrity check modules, the protected copy is able to self-detect semantics-preserving transformations. A variety of transformations were applied to an exemplary protected copy to verify that protected copy behaved as expected. In each case, the protected copy failed to function correctly after the obfuscation had been applied, as desired.
A common manual attack is to inspect the code of an application to locate and remove a license check. Successful removal of the license check in the protected copy requires an attacker to “unravel” the displacement table T and replace all of the calls with the correct branch instruction and displacement; otherwise the alteration is detected by the protected copy. This attack requires extensive dynamic analysis that in many cases is prevented by the integrity check modules installed in the protected copy by the system. For example, the use of a debugger can be self-detected by the protected copy, leading to incorrect program behavior or failure, as desired.
The system further inhibits the ability of an adversary to reverse engineer the protected copy. By replacing conditional and unconditional jumps, the obvious control flow of the protected copy has been removed. Advantageously, the protected copy detects an attack based on information available only at runtime, eliminating the use of static analysis tools. To completely reverse engineer the program, an attacker has to dynamically analyze the protected copy. The integrity check module installed in the protected copy by system significantly inhibits dynamic analysis.
The system improves the level of protection of the protected copy by intertwining tamper detection with software watermarking such as author marking and fingerprinting performed by the watermarking module. The system provides another protection mechanism for the protected copy and protects the watermark from being damaged or removed.
Fingerprinting can be accomplished through the use of a key generating branch function, the fingerprint branch function. The watermark processor embeds the fingerprint mark in the protected copy by selecting a specific set of functions that call the fingerprint branch function. The fingerprint mark is a composition of one or more final keys produced for each of the selected functions.
Each time the fingerprint branch function executes in the protected copy, the fingerprint key is evolved in a key generation cycle. The generation of the fingerprint key is based on a one-way function that takes as input the integrity check value and a previous fingerprint key, ki−1:ki=g(ki−1,vi).The newly generated fingerprint key ki is used in the displacement look-up. The displacement is found in slot h(ki) in the displacement table T.
The displacement is mapped to a specific fingerprint key in the key generation cycle. Consequently, the system uses a subset of the branch instructions in application that are on a deterministic path; i.e. the branch instructions are executed each time the protected copy executes. If a branch instruction is not executed each time the protected copy executes, the fingerprint key does not evolve correctly. Therefore, the branch instructions used for computation of the fingerprint key in the watermarking module are required to reside on a deterministic path through the protected copy.
FIG. 14 is an exemplary control flow graph of a function illustrating deterministic and non-deterministic branch instructions. The control flow to block 1010 and block 1015 from block 1005 represents a “conditional” branch instruction where either block 1010 or block 1015 is executed. A branch instruction 1030 in block 1005 and a call instruction 1035 in block 1020 are deterministic branch instructions and can be used by the watermarking processor for fingerprinting the protected copy. However, the watermarking processor cannot use a branch instruction jmp 1040 in block 1010; the branch instruction jmp 1040 is part of the else branch instruction and is not guaranteed execution every time the function 1000 is executed. Furthermore, branch instructions selected for use by the watermarking processor cannot be part of a non-deterministic loop because a new fingerprint key is generated in the protected copy each time one of the fingerprint branch functions is executed.
A set of deterministic branch instructions useable by the system to generate a fingerprint is identified through data-flow analysis of the application. Each of the selected deterministic branch instructions is replaced with a call to one of the fingerprint branch functions. When the protected copy comprises both an integrity check module and a watermarking module, the system uses for the integrity check modules those non-deterministic branch instructions not used to generate the fingerprint key for the watermarking module.
The watermarking processor comprises an embed function and a recognize function. The embed function for the system can be described with the following inputs and outputs:Embed(P,AM,keyAM,keyFM)→P′,FM,where P is an input program, application, AM is the authorship mark, keyAM is the secret authorship key for the authorship mark, FM is the fingerprint mark, keyFM is a secret fingerprint key for the fingerprint mark, and P′ is an output program, the protected copy.
The system concurrently embeds the authorship mark and the fingerprint mark. Consequently, two secret keys are required: the authorship key keyAM, and the fingerprint key keyFM. In contrast, a conventional fingerprinting system uses a single key. The authorship key keyAM, is tied to the authorship mark and is the same for every copy of the protected copy. The fingerprint key keyFM is required for the fingerprint mark and is unique for each copy of the protected copy. An initial key value for the fingerprint key keyFM is optionally assigned to the protected copy as part of the process of registering the protected copy. The fingerprint mark for a particular instance of the protected copy is based on the fingerprint key keyFM, and execution of the protected copy. The fingerprint mark, is generated during embedding and is an output of the embed function.
The recognize function for watermarking processor can be described with the following inputs and outputs:recognize(P′,keyAM,keyFM)→AM,FMThe recognition technique of the watermarking processor is blind; i.e., the authorship mark and the fingerprint mark are obtained from the protected copy by providing the secret keys: the authorship key keyAM, and the fingerprint key keyFM.
The watermarking module of the system is dynamic; i.e., the authorship key keyAM, is an input to the protected copy. By executing the protected copy with the secret input of the authorship key keyAM, a trace comprising of a set of deterministic branch instructions is identified. The set of deterministic branch instructions comprises those functions that participate in a calculation of the fingerprint mark. The authorship key keyAM, serves to provide a stronger argument for the validity of both the authorship mark and the fingerprint mark. Further, the authorship key keyAM makes recognition more reliable. When the protected copy is executed with the authorship key keyAM, (i.e., the secret input), the owner of the application knows that the fingerprint mark is produced and knows where to locate the fingerprint mark.
The watermarking module generates the fingerprint mark as the protected copy executes through the use of the fingerprint branch function and the fingerprint key keyFM. The original branch instruction in the application simply transferred execution control to the branch instruction target. In addition to the transfer of execution control, the fingerprint branch function is responsible for evolving the fingerprint key keyFM.
FIG. 15 illustrates a method of the fingerprint branch function in evolving the fingerprint key keyFM. Each time the fingerprint branch function is called, a new fingerprint key ki, is produced and the return location of the target instruction is identified with the aid of the fingerprint key ki. The fingerprint branch function performs an integrity check that produces a value, vi. The fingerprint branch function evolves the next fingerprint key, ki, through the use of a one-way function, ki=g(ki−1, vi). The fingerprint branch function identifies a slot, si, in the displacement table, T where the displacement to the target instruction is located si=h(ki).
The fingerprint branch function identifies a displacement to the next instruction via di=T[si], where the displacement table, T is stored in the data section of the protected copy and h is a perfect hashing function. The fingerprint branch function computes the return location of the target instruction by adding the displacement, di, to the return address of the call instruction that called the fingerprint branch function. Execution control is returned to the target instruction at the computed return location.
A variation in the fingerprint mark is obtained through the fingerprint key, keyFM, that is unique for each copy of the protected copy. The system uses an initial key in the generation of the fingerprint key, keyFM. The system obtains each function in the set of functions comprising a deterministic branch instruction by executing the protected copy with the secret input, producing a function key. Each of these function keys is combined in a commutative way (e.g., by adding the values) to produce the fingerprint mark for the protected copy.
Within the fingerprint branch function, the authorship mark and tamper detection can be incorporated. An ideal authorship mark possesses some mathematical property that allows for a strong argument that it was intentionally placed in the program and that its discovery is not accidental. An example of such a mathematical property is AM=pq where p and q are very large primes. Factoring the authorship mark into p and q is a difficult problem; only the person who embedded such a watermark is able to identify the factors p and q. To encode the authorship mark in the fingerprint branch function, the system uses a one-way function such that one of the variables in the authorship mark. An exemplary one-way function is:ki=SHA1[(ki−1XOR AM)∥vi]As used herein, the term SHA1 refers to a specific cryptographic hash function that can be used in conjunction with the '934 invention.
The system provides tamper detection with the branch function. The displacement table, T stores a displacement, therefore any transformation applied to the branch function that alters the displacement between a branch instruction and the target instruction of the branch instruction causes the branch function to return to an incorrect instruction. When utilizing the fingerprint branch function, the system incorporates an integrity check that provides tamper detection throughout the protected copy. An integrity check is a section of code inserted in the protected copy to verify the integrity of the protected copy. One such example of an integrity check is a checksum over a block of code.
The integrity check identifies whether the protected copy has been subjected to semantics-preserving transformations or whether a debugger is present during execution of the protected copy. The integrity check produces some value vi that is used as an additional input to the one-way function responsible for the generation of the fingerprint key. By basing the generation of the fingerprint key on ki−1 and vi, the system is able to cause failure of the protected copy if either the fingerprint key or the code of the protected copy has been altered.
The system embeds the fingerprint mark and the authorship mark by injecting the fingerprint branch function into the application. The system further embeds the fingerprint mark and the authorship mark by converting branch instructions to calls to the fingerprint branch function. FIG. 16 illustrates a method of the embedding process of the watermarking processor. The watermarking processor executes application using the secret input to obtain a trace of application.
The trace identifies a set of functions comprising deterministic branch instructions through which execution passes; the selected deterministic branch instructions reside on a path through a function that is traversed each time the function is executed. To identify the deterministic path through the function in the application, the watermarking processor computes a dominator set for the exit block in a function control flow graph. The dominator set may comprise blocks that are part of a non-deterministic loop, such as the loop header. Any such block is removed from the path.
The watermarking processor replaces each of the selected deterministic branch instructions with calls to the fingerprint branch function. For each branch instruction replaced by the watermarking processor, the watermarking module maintains a mapping between a calculated key and the displacement to the target instruction:Theta={k1→d1,k2→d2, . . . , kn→dn}The watermarking processor uses Theta to construct the displacement table, T. The watermarking processor constructs the perfect hash function such that each fingerprint key maps to a unique slot in the displacement table, T and the size of the displacement table, T is minimized: h={k1, k2, . . . , km}→{1, 2, . . . , n}The watermarking processor adds the displacement table, T to the data section of the protected copy: T[h(ki)]=di 
FIG. 17 illustrates a method of operation in recognizing an embedded authorship mark or an embedded fingerprint mark. The system executes the protected copy using the secret input. Executing the protected copy identifies a set of functions comprising deterministic branch instructions that have been fingerprinted. Executing the protected copy further identifies the fingerprint branch function(s). Once the fingerprint branch function has been identified, the system isolates the one-way function to extract the authorship mark. To extract the fingerprint mark, the system accesses the location where the evolved fingerprint key is stored for each of the functions comprising deterministic branch instructions. The evolved fingerprint key can be stored in the stack, in memory, etc. The system combines the evolved fingerprint keys to form the fingerprint mark.
The protected copy is successful in defending attacks such as, for example, an additive attack, a distortive attack, a collusive attack, or a subtractive attack. In an additive attack, an adversary embeds an additional watermark in the protected copy so as to cast doubt on the origin of the intellectual property. An attacker is successful even if the original mark remains intact; however, the attacker considers it more desirable to damage the original mark. The watermarking module is successful at thwarting an additive attack even if a different watermarking scheme is used to embed the second watermark.
The protected copy is successful at defending against the additive attack because of the use of the fingerprint branch function, the integrity check branch function, or the branch function. The displacement table, T stores a displacement to the next instruction, therefore any transformation applied to a function that alters the displacement between a branch instruction and its target instruction without updating the displacement table, T causes the fingerprint branch function, the integrity check branch function, or the branch function to return to an incorrect instruction. Consequently, any modification made to the protected copy by inserting additional watermark code that alters the displacements yields a non-functional program.
Furthermore, the protected copy is successful at defending against the additive attack because of the use of the integrity check module. The integrity check module monitors execution of the protected copy, thus detecting any modifications made by embedding an additional watermark in the protected copy.
In a distortive attack, an attacker applies a series of semantics-preserving transformations to the protected copy in an attempt to render a watermark such as the authorship mark or the fingerprint mark useless. The attacker wishes to distort the protected copy in such a way that the watermark becomes unrecoverable while the functionality and performance of the protected copy remain intact. As with the additive attack, a distortive attack cannot succeed in yielding a fully functional protected copy that no longer contains the watermark. Any change in the code of the protected copy either trips an integrity check or alters a displacement to a target instruction, causing the protected copy to produce incorrect results.
A collusive attack occurs when an adversary obtains more than one copy of the protected copy, each with a different fingerprint mark. The attacker compares the copies in an attempt to isolate the fingerprint mark. With conventional watermarking algorithms, prevention of a collusive attack is addressed through the use of code obfuscation. Code obfuscation applies different sets of obfuscations to the fingerprinted program, making the code different throughout the fingerprinted program rather than just at the fingerprint mark. While this is a viable option for thwarting a collusive attack, code obfuscation incurs a noticeable performance overhead and increases the size of the fingerprinted program.
The protected copy is highly resistant to the collusive attack without the use of obfuscation. The only difference between two differently fingerprinted copies of the protected copy is the order of the values in the displacement table, T. Consequently, an attacker has to carefully examine the data section of each of the differently fingerprinted copies of the protected copy to identify a difference.
Collusive attacks can be further thwarted through the use of the integrity check module. The integrity check module recognizes the use of a debugger and causes the program to fail in the case of an attack. In a dynamic attack, the only difference the adversary can detect is the value of the fingerprint key that is evolved to yield a different slot in the displacement table, T. If an adversary is able to launch a successful collusive attack, extensive manual analysis in the form of a subtractive attack is required to remove the fingerprint mark.
In a subtractive attack, an attacker attempts to completely remove a watermark such as the authorship mark or the fingerprint mark from the disassembled or decompiled code of the protected copy. If the attacker is able to identify which sections of code in the protected copy are generating the fingerprint mark, the attacker then has to manually analyze the protected copy to identify all of the call instructions that are converted branch instructions. The attacker then has to identify the correct target instruction and replace the call with the correct branch instruction and displacement.
If the attacker only converts those branch instructions responsible for generation of the fingerprint mark and does not also convert the other branch instructions, the protected copy fails to execute properly. The protected copy fails because the decoy branch functions are designed as a check and guard system. One of the duties of the check and guard system is to verify that the fingerprint branch function has not been altered or removed. Consequently, removal of the fingerprint branch function also requires removal of the decoy branch functions. The manual analysis required to accomplish such a task is extensive.
The '936 invention is described in relation to an application, but is applicable as well to, for example, any executable software code, such as Java bytecode. More specifically, each programming language places different restrictions on the capabilities of the language, the structure of the produced code, and the functionality. An alternative embodiment of the protection mechanism relies on the use of an interface and explicitly thrown exceptions. The main difference between the previously described protection mechanism and this alternative embodiment is in the manner in which the fingerprint branch function transfers execution control. Previously, the table stored displacements. In the Java version the table stores objects. The Java fingerprint branch function looks up an object in an array and then calls that method's branch function. The purpose of the function branch is to throw a unique exception. Once the exception is thrown, it will be propagated up to the method that invoked the fingerprint branch function. When this occurs, the invoking method will find the exception in its exception table and transfer control to the instruction specified. This instruction is the target of the converted branch.
The Java fingerprint branch function performs the following:                An integrity check producing a value vi.        Generation of the next method key, ki, through the use of a one-way function, ki=g(ki−1, vi).        Object look up through the use of a table, the key, and a hash function, A a=T[h(ki)].        Call the method branch using the object a, a.branch( ).        
The main idea of the fingerprint branch function is similar to what has been described earlier. The function still performs an integrity check, evolves a key, and eventually transfers execution back to the calling function, but the means for accomplishing this are different.
In order to perform the fingerprint calculation, an interface, A, is added to the program. A must specify at least one branch method. Additionally, n classes A1, A2, . . . , An are added to the program which each implement the interface A. The branch method in each of the Ai classes will throw a unique exception. A table of objects which are subclasses of A is also inserted, so a combination of objects A1, . . . , An exists. This table is inserted in the form of an array T. The order of the objects in T is determined in the same manner as the displacement table in the previously described protection mechanism. A key aspect of this fingerprint branch function is the use of the interface. Because the object is being instantiated as type A, which is an interface, the method lookup for branch will be dynamic and occur at runtime.
In the previous protection mechanism all conditional and unconditional branch instructions are replaced. In this alternative embodiment, only the goto and conditional branches are replaced, and the method call instructions are not replaced or the instructions in Java invoked, because the target listed in the exception table must be an instruction within the method. Only those branches on the deterministic path can be used to call the fingerprint branch function. This restriction is necessary for the same reason as with the previously described protection mechanism.
Another important aspect of the Java Branch-Based watermark is that for each converted branch, n entries must be added to the exception table. One of the entries is the correct target, and n−1 are decoys. If the decoy exception entries are omitted then the branch, target pairs become obvious. During the Java verification process exception edges are considered a possible path when checking for consistent stack height, that local variables have been initialized, etc. Thus, the targets of the decoy exceptions must be chosen such that the bytecode will still pass the Java verifier.
While the '934 Branch-Based invention avoids the '130 Event-Log Based invention's need for a clearinghouse and thus enables offline tamper-resistance, as a practical matter the '130 and '934 inventions are both constrained to allow key evolution to occur only along the common execution path. An invention that provides software tamper-resistance without a clearinghouse, and without such a constraint on key evolution points is therefore needed.