The market for computer software in all of its various forms is recognized to be very large and is growing everyday. In industrialized nations, hardly a business exists that does not rely on computers and software either directly or indirectly, in their daily operations. As well, with the expansion of powerful communication networks such as the Internet, the ease with which computer software may be exchanged, copied and distributed is also growing daily.
With this growth of computing power and communication networks, a user's ability to obtain and run unauthorized or unlicensed software is becoming less and less difficult, and a practical means of protecting such computer software has yet to be devised.
Computer software is generally written by software developers in a high-level language which must be compiled into low-level object code in order to execute on a computer or other processor.
High-level computer languages use command wording that closely mirrors plain language, so they can be easily read by one skilled in the art. Object-code generally refers to machine-executable code, which is the output of a software compiler that translates source code from human-readable to machine-executable code.
The low-level structure of object code refers to the actual details of how the program works. Low-level analysis usually focuses on, or at least begins with, one routine at a time. This routine may be, for example, a procedure, function or method. Analysis of individual routines may be followed by analyses of wider scope in some compilation tool sets.
The low-level structure of a software program is usually described in terms of its data flow and control flow. Data-flow is a description of the variables together with the operations performed on them. Control-flow is a description of how control jumps from place to place in the program during execution, and the tests that are performed to determine those jumps.
Tampering refers to changing computer software in a manner that is against the wishes of the original author. Traditionally, computer software programs have had limitations encoded into them, such as requiring password access, preventing copying, or allowing the software only to execute a predetermined number of times or for a certain duration. However, because the user has complete access to the software code, methods have been found to identify the code administering these limitations. Once this coding has been identified, the user is able to overcome these programmed limitations by modifying the software code.
Since a piece of computer software is simply a listing of data bits, ultimately, one cannot prevent attackers from making copies and making arbitrary changes. As well, there is no way to prevent users from monitoring the computer software as it executes. This allows the user to obtain the complete data-flow and control-flow, so it was traditionally thought that the user could identify and undo any protection. This theory seemed to be supported in practice. This was the essence of the copy-protection against hacking war that was common on Apple-II and early PC software, and has resulted in these copy-protection efforts being generally abandoned.
Since then, a number of attempts have been made to prevent attacks by “obfuscating” or making the organisation of the software code more confusing and hence, more difficult to modify. Software is commercially available to “obfuscate” source in code in manners such as:                globally replacing variable names with random character strings. For example, each occurrence of the variable name “SecurityCode” could be replaced with the character string “1xcd385mxc” so that it is more difficult for an attacker to identify the variables he is looking for;        deleting comments and other documentation; and        removing source-level structural indentations, such as the indentation of loop bodies, to make the loops more difficult to read.        
While these techniques obscure the source code, they do not make any attempts to deter modification. These methods produce superficial changes, but the information exposed by deeper analyses employed by optimizing compilers and similar sophisticated tools is changed very little. The data flow and control flow information exposed by such analyses is either not affected at all, or is only slightly affected, by the above methods of obfuscation. Once the attacker has figured out how the code operates, he is free to modify it as he choses.
A more complex approach to obfuscation is presented in issued U.S. Pat. No. 5,748,741 which describes a method of obfuscating computer software by artificially constructing a “complex wall”. This “complex wall” is preferably a “cascade” structure, where each output is dependent on all inputs. The original program is protected by merging it with this cascade, intertwining the two. The intention is to make it very difficult for the attacker to separate the original program from the complex wall again, which is necessary to alter the original program. This system suffers from several major problems:                large code expansion, exceeding a hundred fold, required to create a sufficiently elaborate complex wall, and to accommodate its intertwining with the original code; and        low security since the obfuscated program may be divided into manageable blocks which may be de-coded individually, allowing the protection to be removed one operation at a time.        
Other researchers are beginning to explore the potential for obfuscation in ways far more effective than what is achieved by current commercial code obfuscators, though still inferior to the obfuscation of issued U.S. Pat. No. 5,748,741. For example, in their paper “Manufacturing cheap, resilient, and stealthy opaque constructs”, Conference on Principles of Programming Languages (POPL), 1998 [ACM 0-89791-979-3/98/01], pp. 184-196, C. Collburg, C. Thomborson, and D. Low propose a number of ways of obscuring a computer program. In particular, Collburg et al. disclose obscuring the decision process in the program, that is, obscuring those computations on which binary or multiway conditional branches determine their branch targets. Clearly, there are major deficiencies to this approach, including:                because only control-flow is being addressed, domain transforms are not used and data obfuscation is weak; and        there is no effort to provide tamper-resistance. In fact, Collburg et al. do not appear to recognize the distinction between tamper-resistance and obfuscation, and as a result, do not provide any tamper-proofing at all.        
The approach of Collburg et al. is based on the premise that obfuscation can not offer a complete solution to tamper protection. Collburg et al. state that: “. . . code obfuscation can never completely protect an application from malicious reverse-engineering efforts. Given enough time and determination, Bob will always be able to dissect Alice's application to retrieve its important algorithms and data structures.”
A software approach for computing with encrypted data is described by Niv Ahituv, Yeheskel Lapid, and Seev Neumann, in Processing encrypted data, Communications of the ACM 30(9), September 1987, pp. 777-780. This method hides the actual value of the data from the software doing the computation. However, the computations which are practical using this technique are quite restricted.
In Breaking abstractions and unstructuring data structures, IEEE International Conference on Computer Languages, 1998, Christian Collberg, Clark Thomborson, and Douglas Low provide more comprehensive proposals on obfuscation, together with methods for obfuscation of structured and object-oriented data.
There remains a weakness, however, in the methods proposed by Ahituv et al. and Collberg et al. Obfuscation and tamper-resistance are distinct problems, and while weak obfuscation is provided by Ahituv et al. and Collberg et al., they do not address tamper resistance at all. For example, consider removing password protection from an application by changing the password decision branch from a conditional one to an unconditional one. Plainly, this vulnerability cannot be eliminated effectively by any amount of mere obfuscation. A patient attacker tracing the code will eventually find the “pass, friend”/“begone, foe” branch instruction. Identifying this branch instruction allows the attacker to circumvent a protection routine by simply re-coding it to a non-conditional branch. Therefore, other methods are required to avoid such single points of failure.
The level of obfuscation obtained using the above techniques is plainly quite weak, since the executed code, control flow and data flow analysed in graph form, is either isomorphic to, or nearly isomorphic to, the unprotected code. That is, although the details of the obfuscated code are different from the original code, the general organisation and structure have not changed.
As noted above, it is desirable to prevent users from making small, meaningful changes to computer programs, such as overriding copy protection and timeouts in demonstration software. It is also necessary to protect computer software against reverse engineering which might be used to identify valuable intellectual property contained within a software algorithm or model. In hardware design, for example,vendors of application specific integrated circuit (ASIC) cell libraries often provide precise software models corresponding to the hardware, so that users can perform accurate system simulations. Because such a disclosure usually provides sufficient detail to reveal the actual cell design, it is desirable to protect the content of the software model.
In other applications, such as emerging encryption and electronic signature technologies, there is a need to hide secret keys in software programs and transmissions, so that software programs can sign, encrypt and decrypt transactions and other software modules. At the same time, these secret keys must be protected against being leaked.
There is therefore a need for a method and system of making computer software resistant to tampering and reverse engineering. This design must be provided with consideration for the necessary processing power and real time delay to execute the protected software code, and the memory required to store it.