The market for computer software in all of its various forms is recognized to be very large and is growing everyday. In industrialized nations, hardly a business exists that does not rely on computers and software either directly or indirectly, in their daily operations. As well, with the expansion of powerful communication networks such as the Internet, the ease with which computer software may be exchanged, copied and distributed is also growing daily.
With this growth of computing power and communication networks, a user's ability to obtain and run unauthorized or unlicensed software is becoming less and less difficult, and a practical means of protecting such computer software has yet to be devised.
Computer software is generally written by software developers in a high-level language which must be compiled into low-level object code in order to execute on a computer or other processor.
High-level computer languages use command wording that closely mirrors plain language, so they can be easily read by one skilled in the art. Typically, source code files have a suffix that identifies the corresponding language. For example, Java™ is a currently popular high-level language and its source code typically carries a name such as “prog1. java”. High-level structure refers to, for example, the class hierarchy of object oriented programs, or the module structure in Ada™ programs.
Object-code generally refers to machine-executable code, which is the output of a software compiler that translates source code from human-readable to machine-executable code. In the case of Java™, there is one file per class and the files have names such as “className.class”, where “className” is the name of the class. Such files are generally called “.class files”.
The low-level structure of object code refers to the actual details of how the program works. Low-level analysis usually focuses on, or at least begins with, one routine at a time. This routine may be, for example, a procedure, function or method. Analysis of individual routines may be followed by analyses of wider scope in some compilation tool sets.
The low-level structure of a software program is usually described in terms of its data flow and control flow. Data-flow is a description of the variables together with the operations performed on them. Control-flow is a description of how control jumps from place to place in the program during execution, and the tests that are performed to determine those jumps.
Tampering refers to changing computer software in a manner that is against the wishes of the original author. Traditionally, computer software programs have had limitations encoded into them, such as requiring password access, preventing copying, or allowing the software only to execute a predetermined number of times or for a certain duration. However, because the user has complete access to the software code, methods have been found to identify the code administering these limitations. Once this coding has been identified, the user is able to overcome these programmed limitations by modifying the software code.
Since a piece of computer software is simply a listing of data bits, ultimately, one cannot prevent attackers from making copies and making arbitrary changes. As well, there is no way to prevent users from monitoring the computer software as it executes. This allows the user to obtain the complete data-flow and control-flow, so it was traditionally thought that the user could identify and undo any protection. This theory seemed to be supported in practice. This was the essence of the copy-protection against hacking war that was common on Apple-II and early PC software, and has resulted in these copy-protection efforts being generally abandoned.
Since then, a number of attempts have been made to prevent attacks by “obfuscating” or making the organisation of the software code more confusing and hence, more difficult to modify. Software is commercially available to “obfuscate” source in code in manners such as:                globally replacing variable names with random character strings. For example, each occurrence of the variable name “SecurityCode” could be replaced with the character string “1xcd385mxc” so that it is more difficult for an attacker to identify the variables he is looking for;        deleting comments and other documentation; and        removing source-level structural indentations, such as the indentation of loop bodies, to make the loops more difficult to read.        
While these techniques obscure the source code, they do not make any attempts to deter modification. Once the attacker has figured out how the code operates, he is free to modify it as he choses.
A more complex approach to obfuscation is presented in issued U.S. Pat. No. 5,748,741 which describes a method of obfuscating computer software by artificially constructing a “complex wall”. This “complex wall” is preferably a “cascade” structure, where each output is dependent on all inputs. The original program is protected by merging it with this cascade, by intertwining the two. The intention is to make it very difficult for the attacker to separate the original program from the complex wall again, which is necessary to alter the original program. This system suffers from several major problems:                large code expansion, exceeding a hundred fold, required to create a sufficiently elaborate complex wall, and to accommodate its intertwining with the original code; and        low security since the obfuscated program may be divided into manageable blocks which may be de-coded individually, allowing the protection to be removed one operation at a time.        
Other researchers are beginning to explore the potential for obfuscation in ways far more effective than what is achieved by current commercial code obfuscators, though still inferior to the obfuscation of issued U.S. Pat. No. 5,748,741. For example, in their paper “Manufacturing cheap, resilient, and stealthy opaque constructs”, Conference on Principles of Programming Languages (POPL), 1998[ACM 0-89791-979-3/98/01], pp. 184-196, C. Collburg, C. Thomborson, and D. Low propose a number of ways of obscuring a computer program. In particular, Collburg et al. disclose obscuring the decision process in the program, that is, obscuring those computations on which binary or multiway conditional branches determine their branch targets. Clearly, there are major deficiencies to this approach, including:                because only control-flow is being addressed, domain transforms are not used and data obfuscation is weak; and        there is no effort to provide tamper-resistance. In fact, Collburg et al. do not appear to recognize the distinction between tamper-resistance and obfuscation, and as a result, do not provide any tamper-proofing at all.        
The approach of Collburg et al. is based on the premise that obfuscation can not offer a complete solution to tamper protection. Collburg et al. state that: “. . . code obfuscation can never completely protect an application from malicious reverse-engineering efforts. Given enough time and determination, Bob will always be able to dissect Alice's application to retrieve its important algorithms and data structures.”
As noted above, it is desirable to prevent users from making small, meaningful changes to computer programs, such as overriding copy protection and timeouts in demonstration software. It is also necessary to protect computer software against reverse engineering which might be used to identify valuable intellectual property contained within a software algorithm or model. In hardware design, for example, vendors of application specific integrated circuit (ASIC) cell libraries often provide precise software models corresponding to the hardware, so that users can perform accurate system simulations. Because such a disclosure usually provides sufficient detail to reveal the actual cell design, it is desirable to protect the content of the software model.
In other applications, such as emerging encryption and electronic signature technologies, there is a need to hide secret keys in software programs and transmissions, so that software programs can sign, encrypt and decrypt transactions and other software modules. At the same time, these secret keys must be protected against being leaked.
There is therefore a need for a method and system of making computer software resistant to tampering and reverse engineering. This design must be provided with consideration for the necessary processing power and real time delay to execute the protected software code, and the memory required to store it.