The market for computer software in all of its various forms is recognized to be very large and is growing everyday. In industrialized nations, hardly a business exists that does not rely on computers and software either directly or indirectly, in their daily operations. As well, with the expansion of powerful communication networks such as the Internet, the ease with which computer software may be exchanged, copied and distributed is also growing daily.
With this growth of computing power and communication networks, a user""s ability to obtain and run unauthorized or unlicensed software is becoming less and less difficult, and a practical means of protecting such computer software has yet to be devised.
Computer software is generally written by software developers in a high-level language which must be compiled into low-level object code in order to execute on a computer or other processor.
High-level computer languages use command wording that closely mirrors plain language, so they can be easily read by one skilled in the art. Typically, source code files have a suffix that identifies the corresponding language. For example, Java(trademark) is a currently popular high-level language and its source code typically carries a name such as xe2x80x9cprog1.javaxe2x80x9d. High-level structure refers to, for example, the class hierarchy of object oriented programs, or the module structure in Ada(trademark) programs.
Object-code generally refers to machine-executable code, which is the output of a software compiler that translates source code from human-readable to machine-executable code. In the case of Java(trademark), there is one file per class and the files have names such as xe2x80x9cclassName.classxe2x80x9d, where xe2x80x9cclassNamexe2x80x9d is the name of the class. Such files are generally called xe2x80x9c.class filesxe2x80x9d.
The low-level structure of object code refers to the actual details of how the program works. Low-level analysis usually focuses on, or at least begins with, one routine at a time. This routine may be, for example, a procedure, function or method. Analysis of individual routines may be followed by analyses of wider scope in some compilation tool sets.
The low-level structure of a software program is usually described in terms of its data flow and control flow. Data-flow is a description of the variables together with the operations performed on them. Control-flow is a description of how control jumps from place to place in the program during execution, and the tests that are performed to determine those jumps.
Tampering refers to changing computer software in a manner that is against the wishes of the original author. Traditionally, computer software programs have had limitations encoded into them, such as requiring password access, preventing copying, or allowing the software only to execute a predetermined number of times or for a certain duration. However, because the user has complete access to the software code, methods have been found to identify the code administering these limitations. Once this coding has been identified, the user is able to overcome these programmed limitations by modifying the software code.
Since a piece of computer software is simply a listing of data bits, ultimately, one cannot prevent attackers from making copies and making arbitrary changes. As well, there is no way to prevent users from monitoring the computer software as it executes. This allows the user to obtain the complete data-flow and control-flow, so it was traditionally thought that the user could identify and undo any protection. This theory seemed to be supported in practice. This was the essence of the copy-protection against hacking war that was common on Apple-II and early PC software, and has resulted in these copy-protection efforts being generally abandoned.
Since then, a number of attempts have been made to prevent attacks by xe2x80x9cobfuscatingxe2x80x9d or making the organisation of the software code more confusing and hence, more difficult to modify. Software is commercially available to xe2x80x9cobfuscatexe2x80x9d source in code in manners such as:
globally replacing variable names with random character strings. For example, each occurrence of the variable name xe2x80x9cSecurityCodexe2x80x9d could be replaced with the character string xe2x80x9c1xcd385mxcxe2x80x9d so that it is more difficult for an attacker to identify the variables he is looking for;
deleting comments and other documentation; and
removing source-level structural indentations, such as the indentation of loop bodies, to make the loops more difficult to read.
While these techniques obscure the source code, they do not make any attempts to deter modification. Once the attacker has figured out how the code operates, he is free to modify it as he choses.
A more complex approach to obfuscation is presented in issued U.S. Pat. No. 5,748,741 which describes a method of obfuscating computer software by artificially constructing a xe2x80x9ccomplex wallxe2x80x9d. This xe2x80x9ccomplex wallxe2x80x9d is preferably a xe2x80x9ccascadexe2x80x9d structure, where each output is dependent on all inputs. The original program is protected by merging it with this cascade, by intertwining the two. The intention is to make it very difficult for the attacker to separate the original program from the complex wall again, which is necessary to alter the original program. This system suffers from several major problems:
large code expansion, exceeding a hundred fold, required to create a sufficiently elaborate complex wall, and to accommodate its intertwining with the original code; and
low security since the obfuscated program may be divided into manageable blocks which may be de-coded individually, allowing the protection to be removed one operation at a time.
Other researchers are beginning to explore the potential for obfuscation in ways far more effective than what is achieved by current commercial code obfuscators, though still inferior to the obfuscation of issued U.S. Pat. No. 5,748,741. For example, in their paper xe2x80x9cManufacturing cheap, resilient, and stealthy opaque constructsxe2x80x9d, Conference on Principles of Programming Languages (POPL), 1998 [ACM 0-89791-979-3/98/01], pp. 184-196, C. Collburg, C. Thomborson, and D.
Low propose a number of ways of obscuring a computer program. In particular, Collburg et al. disclose obscuring the decision process in the program, that is, obscuring those computations on which binary or multiway conditional branches determine their branch targets. Clearly, there are major deficiencies to this approach, including:
because only control-flow is being addressed, domain transforms are not used and data obfuscation is weak; and
there is no effort to provide tamper-resistance. In fact, Collburg et al. do not appear to recognize the distinction between tamper-resistance and obfuscation, and as a result, do not provide any tamper-proofing at all.
The approach of Collburg et al. is based on the premise that obfuscation can not offer a complete solution to tamper protection. Collburg et al. state that: xe2x80x9c. . . code obfuscation can never completely protect an application from malicious reverse engineering efforts. Given enough time and determination, Bob will always be able to dissect Alice""s application to retrieve its important algorithms and data structures.xe2x80x9d
As noted above, it is desirable to prevent users from making small, meaningful changes to computer programs, such as overriding copy protection and timeouts in demonstration software. It is also necessary to protect computer software against reverse engineering which might be used to identify valuable intellectual property contained within a software algorithm or model. In hardware design, for example, vendors of application specific integrated circuit (ASIC) cell libraries often provide precise software models corresponding to the hardware, so that users can perform accurate system simulations. Because such a disclosure usually provides sufficient detail to reveal the actual cell design, it is desirable to protect the content of the software model.
In other applications, such as emerging encryption and electronic signature technologies, there is a need to hide secret keys in software programs and transmissions, so that software programs can sign, encrypt and decrypt transactions and other software modules. At the same time, these secret keys must be protected against being leaked.
There is therefore a need for a method and system of making computer software resistant to tampering and reverse engineering. This design must be provided with consideration for the necessary processing power and real time delay to execute the protected software code, and the memory required to store it.
It is therefore an object of the invention to provide a method and system of making computer software resistant to tampering and reverse engineering which addresses the problems outlined above.
The method and system of the invention recognizes that attackers cannot be prevented from making copies and making arbitrary changes. However, the most significant problem is xe2x80x9cuseful tamperingxe2x80x9d which refers to making small changes in behaviour. For example, if the trial software was designed to stop working after ten invocations, tampering that changes the xe2x80x9ctenxe2x80x9d to xe2x80x9chundredxe2x80x9d is a concern, but tampering that crashes the program totally is not a priority since the attacker gains no benefit.
Data-flow describes the variables together with operations performed on them. The invention increases the complexity of the data-flow by orders of magnitude, allowing xe2x80x9csecretsxe2x80x9d to be hidden in the program, or the algorithm itself to be hidden. xe2x80x9cObscuringxe2x80x9d the software coding in the fashion of known code obfuscators is not the primary focus of the invention. Obscurity is necessary, but not sufficient for, achieving the prime objective of the invention, which is tamper-proofing.
One aspect of the invention is broadly defined as a method of increasing the tamper-resistance and obscurity of computer software code comprising the steps of transforming the data flow in the computer software code to dissociate the observable operation of the transformed the computer software code from the intent of the original software code.
A second aspect of the invention is broadly defined as a method of increasing the tamper-resistance and obscurity of computer software code comprising the steps of encoding the computer software code into a domain which does not have a corresponding semantic structure, to increase the tamper-resistance and obscurity of the computer software code.
A further aspect of the invention is defined as a computer readable memory medium, storing computer software code executable to perform the steps of: compiling the computer software program from source code into a corresponding set of intermediate computer software code; encoding the intermediate computer software code into tamper-resistant intermediate computer software code having a domain which does not have a corresponding semantic structure, to increase the tamper-resistance and obscurity of the computer software code; and compiling the tamper-resistant intermediate computer software code into tamper-resistant computer software object code.
An additional aspect of the invention is defined as a computer data signal embodied in a carrier wave, the computer data signal comprising a set of machine executable code being executable by a computer to perform the steps of: compiling the computer software program from source code into a corresponding set of intermediate computer software code; encoding the intermediate computer software code into tamper-resistant intermediate computer software code having a domain which does not have a corresponding semantic structure, to increase the tamper-resistance and obscurity of the computer software code; and compiling the tamper-resistant intermediate computer software code into tamper-resistant computer software object code.
Another aspect of the invention is defined as an apparatus for increasing the tamper-resistance and obscurity of computer software code, comprising: front end compiler means for compiling the computer software program from source code into a corresponding set of intermediate computer software code; encoding means for encoding the intermediate computer software code into tamper-resistant intermediate computer software code having a domain which does not have a corresponding semantic structure, to increase the tamper-resistance and obscurity of the computer software code; and back end compiler means for compiling the tamper-resistant intermediate computer software code into tamper-resistant computer software object code.