1. Field of the Invention
The invention described herein is directed to encoding individual computer instructions with supplemental information. More specifically, the invention locates bit patterns in processing instructions of computer program code so as to determine a compression strategy for the processing instructions that would allow at least one supplemental bit of information to be encoded therein.
2. Description of the Prior Art
Machine instructions in a compiled computer program, as specified by the Instruction Set Architecture (ISA) of a processor, are the primary means for exchanging information between a programmer and the computer hardware. ISAs are many and varied and include architectures of variable-width instructions and fixed-width instructions. An exemplary ISA implementing fixed-width instructions is the Reduced Instruction Set Computing (RISC) architecture employed by several modern processors. Although fixed-width architectures allow the processor to easily fetch, decode and execute the instructions, the encoding space of fixed-width RISC instruction sets typically cannot be expanded so as to add more information to the existing instructions, or to add more instructions to an existing ISA, although several traditional techniques have been attempted in efforts to overcome these shortcomings.
In the prior art, systems have been developed on instruction abbreviation techniques by entropy-bounded encoding the ISA of embedded Digital Signal Processors (DSPs). However, such systems involve variable size instructions, which frequently occur in DSP architecture.
Traditional software watermarking techniques have been proposed, such as those that have been used for intellectual property protection. Early software watermarking schemes re-organize basic blocks in complied code to embed a hidden mark. In more recent systems, the mark is embedded by inserting extra instructions and re-structuring existing instructions in a given program. The watermark is actually defined by the control flow of the program. Dynamic path-based software watermarking has also been attempted so as to use the run-time trace of a program and a particular program input (the secret key) to carry hidden information. An analogous approach was proposed to watermark Hardware Description Language (HDL) code for ASIC and Field Programmable Gate Array (FPGA) design. However, in most software watermarking schemes, usually after watermark embedding, the number of instructions in the program will have increased and the execution of the program will be slowed down. This cost is sometimes acceptable, however reveals limitations to applying such techniques when the goal is to improve system performance.
Various techniques for reversible data embedding and lossless compression exist, such as those used for multimedia data. Certain of these algorithms use additive spread spectrum techniques while others embed supplemental data by modifying selected features, such as the Least Significant Bit (LSB) of the host signal. These techniques cannot be directly applied to program binaries because of the inherent differences between multimedia data and program instruction data. Modification of program data by compression techniques such as the Lempel-Ziv and arithmetic coding schemes, for example, require a priori knowledge of the execution order of the instructions and are not suitable for supplementing system functionality where instruction ordering cannot be relied upon.
The present invention advantageously applies information hiding techniques to program binaries of fixed-width instruction set processors, whereby extra information is transparently embedded. The supplemental information can be extracted by the processor at very low cost in a manner that supplements computer system functionality. The invention stores and extracts the additional information in computer programs in an ISA-independent way and without inserting extra instructions. The embedded data may be used in a variety of ways, such as for value and branch prediction in pipelined instruction execution, as well as to validate the integrity and origin of a program in trusted computing environments.