1. Field of the Invention
The invention relates to a technique including both apparatus and an accompanying method, for forming and embedding a hidden highly tamper-resistant cryptographic identifier, i.e., a watermark, within non-marked computer executable code, e.g., an application program, to generate a xe2x80x9cwatermarkedxe2x80x9d version of that code. This technique can also be used to tightly integrate, in a highly tamper-resistant manner, other pre-defined executable code, such as security code, as part of the watermark, into the non-marked code in order to form the watermarked code.
2. Description of the Prior Art
Over the past decade or so, personal computers (PCs) have become rather ubiquitous with PC hardware and software sales experiencing significant growth. However, coincident with an ever widening market for PCs, unauthorized copying of PC software, whether it be application programs or operating systems, continues to expand to rather significant proportions. Given that in certain countries sales lost to such copying can significantly exceed legitimate sales, over the years software manufacturers have attempted to drastically reduce the incidence of unauthorized copying though, practically speaking, with only limited success.
One such technique, probably one of the oldest techniques used and usually rather ineffective, is simply to append a copyright and other legal proprietary rights notices to object code as distributed on mass (magnetic or optical) media. The intention in doing so is to deter unauthorized copying by simply placing a third party on notice that a copy of the program, embodied by that code, is legally protected and that its owner may take legal action to enforce its rights in the program against that party to prevent such copying. These notices can be readily discovered in program code listings and simply excised by the third party prior to copying and distributing illicit copies. Other such notices can be excised by a third party adversary from the software media itself and the program packaging as well. Though these notices are often necessary in many jurisdictions to secure full legal remedies against third parties, in practice, these notices have provided little, if any, real protection against third party copying.
Another technique that is recently seeing increasing use is to require a PC, on which the program is to execute, to hold a valid digital xe2x80x9ccertificatexe2x80x9d provided by the manufacturer of the program. The certificate will typically be loaded as a separate step during manufacture of the PC. During initialization, the program will test the certificate and confirm its authenticity and validity. If the certificate is authentic and valid, the program will continue to execute; otherwise, the program will simply terminate. Unfortunately, the certificate and associated testing routines are often very loosely bound to the remainder of the program code. Currently available software analysis tools can display execution flow among program instructions in a program under test. Consequently, with such tools, a programmer, with knowledge of an operational sequence implemented by the program and by analyzing a flow pattern inherent in that program, as it executes, can readily discern the program code that implements a certificate testing function. Once this code is detected, the programmer can readily excise that portion from the program itself and simply modify the remaining program code, by, e.g., by inclusion of appropriate jump instruction(s), to compensate for the excised portion; thus, totally frustrating the protection which the certificate was intended to provide against unauthorized copying. Once having done so, a third party adversary can then produce and distribute unauthorized, but fully executable, copies of the program free of all such protection. Thus, in practice, this approach has proven to be easily compromised and hence afforded very little, if any, real protection against illicit copying.
Other techniques have relied on using serialized hardware or other hardware centric arrangements to limit access to a program to one or more users at one particular PC and preclude that program from being loaded onto another PC. Generally, these techniques, often referred to as xe2x80x9ccopy protectxe2x80x9d schemes and which were popular several years ago, relied on inserting a writeable program distribution diskette, such as a floppy diskette, into a PC and then, during execution of an installation process from that diskette, have that PC store machine specific data, such as a hardware identification code, onto the diskette. Thereafter, during each subsequent installation of the program, an installation process would check the stored machine specific data on the installation diskette against that for a specific PC on which the program was then being installed. If the two pieces of data matched, installation would proceed; otherwise, it would in prematurely terminate. Unfortunately, such schemes, while generally effective against unauthorized copying, often precluded legitimate archival copying as well as a legitimate installation of the program on a different PC. In view of substantial inconveniences imposed on the user community, such xe2x80x9ccopy protectxe2x80x9d schemes quickly fell into disuse and hence where basically abandoned shortly after they first saw widespread use. Moreover, any such technique that relies on storing information on the distribution media itself during program installation is no longer practical when today software is distributed on massive read-only optical media, such as CDROM or, soon, digital video disk (DVD).
Therefore, given the drawbacks associated with copy protect and certificate based schemes, one would think that embedding an identifier of some sort into a program, during its manufacture and/or installation and subsequently testing for that identifier during subsequent execution of an installed version of that program at a user PC, would hold promise.
However, for such an identifier based approach to be feasible, a need exists in the art for an identifier, such as a watermark, that can be tightly integrated into a program itself such that the watermark would be extremely difficult, if not effectively impossible, for a third party to discern, such as through flow analysis, and then circumvent, such as by removal.
In particular, such a watermark could be embedded in some fashion into a non-marked program. Then, subsequently, at runtime of an installed version of that program at a user PC, a xe2x80x9csecretxe2x80x9d key(s) based cryptographic process could be used to reveal the presence of and test the watermark. The key(s) would be separately stored down, to the PC, as a software value(s). If the correct watermark were then detected, execution of the installed program would continue; else, execution would halt. Fortunately, such an approach would likely impose essentially no burden on, and preferably be totally transparent to, the user, and not frustrate legitimate copying.
If such an identifier could be made sufficiently impervious to third party detection and tampering, then advantageously its use, with, for example, such an approach, may well prove effective, in practice, at reducing unauthorized third party copying.
Our present invention advantageously satisfies this need and overcomes the deficiencies in the art through a watermark, containing, e.g., a relatively large number of executable routines, that is tightly integrated into a flow pattern of non-marked executable code, e.g., an application program, through randomly establishing additional control (execution) flows in the executable code and inserting a selected one of the routines along each such flow. Since a resulting flow pattern of the watermark is highly intertwined with (tightly spliced into) the flow pattern of the non-marked code, the watermark is effectively impossible to either remove from the code and/or circumvent. Furthermore, the code for the routines themselves is added in such a manner that the flow pattern of resulting xe2x80x9cwatermarkedxe2x80x9d code is not substantially different from that of the non-marked code. Hence, the watermark is also extremely difficult for a third party adversary to discern using, e.g., standard flow analysis tools and human inspection.
Advantageously, to enhance tamper-resistance of the watermarked code, each routine, that constitutes a portion of the watermark, can provide a pre-defined function such that, if that routine were to be removed from the marked code by, e.g., the third party adversary, then the marked code will prematurely terminate its execution.
In accordance with our specific inventive teachings, unmarked executable code which forms, e.g., an application program that is to be watermarked is first converted, using a conventional software flow analysis tool, into its corresponding flow graph. Predefined security code, typically constituting specific predetermined executable software code, is also converted, through use of the same tool, into its corresponding flow graph. The security code can itself constitute, for example: specific xe2x80x9cwatermarkxe2x80x9d code, i.e., executable code having as its primary, if not sole, purpose to form a portion of a watermark (i.e., distinct from the application program itself); a complete image of the entire application program itself; or just a portion of that program. In that regard, the unmarked executable code and the security code can each be formed of a different half of a common application program.
Thereafter, each of the flow graphs is kpartitioned to yield cluster flow graphs Gxe2x80x2 and Hxe2x80x2, respectively (where k is a pre-defined integer, such as illustratively 1000 for a large application program then being watermarked), each having k clusters of nodes (each being a partition). M edges (links) (where M is typically a large pre-defined integer, such as illustratively 500,000 for that application program) are collectively inserted between corresponding pairs of randomly selected nodes in: (a) graphs Gxe2x80x2 and Hxe2x80x2; and, where desired, (b) different clusters solely in graph Gxe2x80x2, and/or (c) different clusters solely in graph Hxe2x80x2. For each edge, a routine is selected from a pre-defined library, based on, e.g., minimizing adverse affects on program flow, and its designation is inserted along that edge in the flow graph(s), specifically at one of the nodes associated with that edge. All the edges collectively and effectively splice clustered flow graphs Gxe2x80x2 and Hxe2x80x2 together into a single combined flow graph. Executable code is then produced which corresponds to that depicted in the single combined flow graph. The watermark is collectively defined by the routines and edges that have been inserted into the unmarked code.
One illustrative heuristic for selecting each specific pair of nodes, in, e.g., cluster graphs Gxe2x80x2 and Hxe2x80x2 that are to be joined by an edge, is as follows. First, randomly pick a node, U, in graph Gxe2x80x2. With xcex being pre-defined as equaling (a number of edges that are to transit between Gxe2x80x2 and Hxe2x80x2)/(a number of edges connected to U), then, with a probability of 1xe2x88x92xcex, randomly choose a node, Y, in graph Hxe2x80x2. Then, with a probability of xcex, randomly choose a node, Z other than U, in graph Gxe2x80x2. Finally, provide, as output, designations of nodes Y and Z as a nodal pair.
During subsequent edge insertion, connect the nodes for that edge together, e.g., nodes Y and Z, so as to insert an edge extending between cluster flow graphs Gxe2x80x2 and Hxe2x80x2. Based on proper program flow, insert an appropriate routine from the library along that edge and at an appropriate node, in a graph, for that pair. Repeat these node selection and insertion steps until all M edges and designations for associated routines are collectively added to cluster graphs Gxe2x80x2 and Hxe2x80x2 so as to fully splice both graphs and the associated routines into a single combined flow graph. Parameters k, M and xcex are preferably kept in secret.
Each of these routines is predetermined, usually quite compact, requires relatively little execution time and executes a pre-defined, often self-contained operation, such as, e.g., computing a cryptographic key for use in printing or decoding a variable, or decrypting a ciphered variable. Each of the operations is chosen so as not to require much processing time; thus, not noticeably degrading execution of the watermarked program. Collectively, the routines that are inserted are such that, for proper execution of the watermarked program, they must all be executed and, to a certain extent, in a given sequence. In that regard, if any one or more of these routines is removed from the watermarked program, such as by a third party adversary, that program will gracefully terminate its execution.
To further frustrate its detection, the code for all the inserted routines is collectively scattered approximately uniformly throughout the xe2x80x9cwatermarkedxe2x80x9d program as that program is being constructed from its combined flow graph. In this manner, the routines will not be centralized in any one portion of the watermarked program. Furthermore, each of these routines is written with standard code xe2x80x9cobfuscationxe2x80x9d techniques to further camouflage their functionality.
Advantageously, as a feature, the present invention can securely watermark any executable code, whether it forms, e.g., an application program, an operating system (O/S) or a software module.