Software piracy costs the software industry billions of dollars in lost revenue every year. One attack that often results in considerable lost revenue occurs when an adversary is able to remove a license check. Once the license check has been circumvented, the attacker is able to freely redistribute the software. Of further concern is protecting client-side software running on a potentially hostile host. Since the host has full control over the execution of the software, a sufficiently determined attacker can completely break any piece of software given sufficient time, effort, or resources.
The problem of protecting software from malicious tampering and reverse engineering has been the focus of considerable research. A variety of techniques have been proposed such as software watermarking, code obfuscation, and tamper detection. Each of these techniques addresses the issue of piracy in a different way and often these techniques can be combined to provide an even stronger defense. For example, software watermarking is generally vulnerable to semantics-preserving transformations. Incorporating tamper-proofing techniques can thwart such an attack. Certain tamper-proofing techniques require outside resources to detect that a violation has occurred. In the case of software watermarking, a suspected illegal copy is obtained and a recognizing tool applied to the software to extract the watermark.
The issue of software protection can be addressed from either a software or hardware-based approach. Hardware-based techniques generally offer a higher level of protection but with the cost of additional expenses for the developer and user inconvenience. Additionally, software is often purchased and distributed over the Internet, making the use of hardware-based techniques (i.e., dongles or smartcards) infeasible. The use of tamperproof CPUs is another hardware-based solution. However, this type of hardware is not widely in use.
Software-based approaches address the issues of cost and user convenience. However, an adversary can often easily circumvent software-based approaches to protection. Two of the most well known software-based techniques are code obfuscation and software watermarking. Code obfuscation transforms the code of a software product in such a way that it is harder for the attacker to understand and reverse engineer. Software watermarking discourages the adversary from illegally copying and redistributing the software by embedding a unique identifier in the code of the software product. Depending on the type of identifier embedded, the watermark can further be used as proof of authorship or purchase. The proof of purchase is used as a “fingerprint” to uniquely identify a purchaser with one copy of a software product. Embedding a proof of purchase, a fingerprint, has the advantage that the source of the illegal distribution can be identified.
Software-based techniques further comprise tamper detection and tamper proofing. Many conventional tamper-detection or tamper-proofing techniques require accessing outside resources such as a periodic connection to a clearinghouse. One conventional technique makes use of an event log that is periodically transmitted to the clearinghouse. Integrity checks are imbedded throughout an application or program. The event log records the results of the integrity checks. When the clearinghouse detects tampering, the user can be blocked from receiving future content such as updates. While this technique has proven to be effective, the use of the clearinghouse is awkward for the user and can be circumvented. Detection of tampering requires the attacker to contact the clearinghouse. Further, even if the tampering is detected, the attacker may still have a functioning application.
Other conventional tamper-detection or tamper-proofing systems comprise self-contained software-based tamper proofing. One conventional technique comprises an algorithm based on integrity verification kernels (IVK). Integrity verification kernels are units of code responsible for performing critical program functions. Each integrity verification kernel is split into smaller blocks of code that are individually encrypted. Execution of each block of code comprises the following steps: decrypting the block of code, executing the block of code, then re-encrypting the block of code. To detect tampering, the sum of the hash of all previously executed blocks of code is checked at each step to verify that the blocks of code were executed correctly and in proper order.
Another conventional technique establishes a network of guards. These guards establish a check and balance system such that each guard monitors or repairs a different section of code. For example, one guard verifies the integrity of a block of code while another guard repairs the block of code when the block of code has been compromised.
The growing concern regarding software piracy can be attributed to a variety of factors such as the distribution of software in architectural-neutral formats that are easy to manipulate and the ease of sharing over the Internet. In previous years piracy was limited by the necessity to physically transfer a piece of software on a floppy disk or CD-ROM. With the increases in bandwidth of the Internet, physical transfer is no longer necessary.
In the event that a software program is illegally redistributed or an important algorithmic secret is stolen, an owner may wish to take action against the theft. This requires demonstration of ownership or identification of the source of the piracy through the use of techniques such as software watermarking. Software watermarking is used to embed a unique identifier in a piece of software to encode identifying information. While this technique does not prevent piracy, it does provide a way to prove ownership of pirated software and, in some cases, identify the original purchaser prior to the piracy. However, software watermarking is required to be resilient against a variety of attacks such as semantics-preserving code transformations and program analysis tools in order to be useful.
Another conventional approach to limiting piracy and theft of software is fingerprinting; i.e., providing within the software a proof of purchase uniquely tied to a purchaser. However, fingerprinting software to trace piracy has not been a viable option for many software developers. One drawback to fingerprinting is that current techniques require companies to alter distribution methods. Using conventional fingerprinting techniques, the fingerprint mark cannot be tied to a purchaser using conventional watermarking techniques and pre-packaged software.
Software watermarking discourages piracy through the attachment of an identifying mark. An authorship mark is a watermark embedded in every copy of the application to identify the author. A software developer uses the authorship mark to prove ownership of pirated software. Both the watermark and the authorship mark are required be robust against tampering in order to be effective. However, only the fingerprint mark requires invisibility. In specific instances, it may be desirable for the authorship mark to be visible, e.g. the authorship mark conveys a level of quality. In instances where the mark is used in a potentially hostile environment to protect a secret or for tracing piracy, invisibility may increase the strength of the mark.
Conventional watermarking techniques comprise any one or more of: a blind watermarking algorithm, an informed watermarking algorithm, a static watermarking algorithm, and a dynamic watermarking algorithm. For both blind and informed watermarking algorithms, the watermarked program and a secret key are required to extract the watermark. However, with an informed watermarking algorithm, a version of the program that is not watermarked or the embedded mark are also required to extract the watermark. A static watermarking algorithm uses the static code and data of the program to embed and recognize the watermark. A dynamic algorithm makes use of information gathered from the execution of the program to embed and recognize the watermark.
Embedding techniques used in conventional software watermarking can be categorized based on how the application is manipulated to encode the watermark. In one embedding technique, semantics-preserving transformations are applied to reorder the code. The particular order chosen represents the watermark. In another embedding technique, the watermark is encoded in a section of code injected in the application that does not contribute to the functionality of the application. In a further embedding technique, the frequency of instructions is altered to encode the watermark.
One conventional static watermarking technique embeds a watermark through an extension to a control flow graph. The watermark is encoded in a sub-graph that is incorporated into the original control flow graph. Another conventional static watermarking technique modifies the instruction frequencies of the original program to embed the watermark. A further conventional static watermarking technique comprises a very stealthy, but fragile, algorithm that makes use of a graph-coloring problem to embed the watermark in a register allocation. However, the watermark of conventional static watermarking techniques can typically be destroyed by basic code optimization or obfuscation techniques.
One conventional dynamic watermarking technique embeds a watermark in the structure of a graph that is built on the heap at runtime as the application program executes on a particular input. Another conventional dynamic watermarking technique makes use of an abstract interpretation framework to embed a watermark in the values assigned to integer local variables during program execution. A further dynamic watermarking technique is path-based and relies on the dynamic branching behavior of the program. To embed the watermark, the sequence of branch instructions taken and not taken on a particular input is modified. Variations for this algorithm were developed to target the varied capabilities of Java bytecode and native executables. Yet another dynamic watermarking technique leverages the ability to execute blocks of code on different threads. The watermark is encoded in the choice of blocks executed on the same thread.
Some conventional watermarking techniques protect software watermarks through tamper proofing. One conventional approach includes a checksum with the watermark. However, an adversary who is able to discover the checksum algorithm can easily attack this technique. Another conventional approach protects a static watermark by encoding a portion of the code of the application in the static watermark. This encoded static watermark is then stored in some portion of the data of the program such as an image. Consequently, an attacker risks altering behavior of the application if the watermark is damaged.
Another conventional approach utilizes constant encoding to tamper proof a dynamic data structure watermarks. Constants in the application are replaced with a function dependent on a dynamic data structure that encodes the watermark. A further conventional approach utilizes error-correcting codes to repair minor damage to the watermark incurred through semantics-preserving transformations.
Although conventional tamper detection and watermarking technology has proven to be useful, it would be desirable to present additional improvements. Conventional software tamper detection techniques often require the use of special hardware or a periodic connection to a clearinghouse. Conventional watermarking techniques can be easily attacked through simple semantics-preserving transformations. Conventional watermarking techniques further require a software developer to choose between providing proof of ownership or tracing the source of the illegal redistribution. Further, a fingerprint mark cannot be tied to a purchaser using conventional watermarking techniques and pre-packaged software.
What is needed is a tamper detection technique that allows an application or program to self-diagnose improper manipulation. Further, a tamper detection technique is desired that causes an application to fail once a license check, watermark, or fingerprint has been removed or the application has been improperly manipulated in any other way. Furthermore, a watermarking technique is desired that allows a developer to concurrently prove authorship and fingerprint an application by tying the application to an individual purchaser through any distribution method such as, for example, as pre-packaged software, over the Internet, etc. What is therefore needed is a system, a service, a computer program product, and an associated method for detecting improper manipulation of an application. The need for such a solution has heretofore remained unsatisfied.