Generally, computer applications run by executing object code. The object code controls the actions of the computer systems on which it is run. Such code may be made public or otherwise made accessible by its authors, for example by publishing the original source code that was compiled to create the object code. The original authors may also choose to make the code more usable by other programmers by including “debug symbols” which are data files which help to describe the structure of the object code so that users of the code can debug their own programs. However, for some uses, it is advisable to protect code from examination by possible adversaries. For example, where the code represents the best available implementation of a particular algorithm, the code itself may represent a trade secret. In another example, where code is used to secure content, it may be useful to protect the code in order to ensure the security of the content from an adversary. In order to protect users of an application from unauthorized tampering with the code, a number of security precautions may be utilized.
Some of these security measures are physical. For example, a user purchasing software on a CD-ROM may be able to verify that the CD-ROM is a legitimate copy of the software by inspecting holograms or other security devices on the packaging.
Module authentication, in which the integrity, and security, of software is protected against tampering, provides a level of protection against malicious changes to the software such as code patching, redirection, and software breakpoints.
One form of module authentication is to ensure that read-only content contained in the software module is unchanged. This may be done via static module authentication. Static module authentication is the process of verifying the persistently stored image of the module, which in some cases can be thought of as the “on-disk” module. For example, one mechanism to check the module on-disk may be accomplished by hashing the file and comparing the resulting hash value with a pre-computed hash value of the file that has been signed by a trusted signatory.
The process of hashing (also known as computing a digest), is a standard cryptographic technique for identifying data with a relatively unique, but substantially smaller representation than the original data. The algorithm can be performed on a binary source of arbitrary length, in this case, a file, and the result of the hashing computation is a smaller, usually fixed-size piece of binary data known as a hash, hash value, or digest. For example, FIPS SHA-1 (Federal Information Processing Standards Secure Hash Algorithm 1) produces a 20-byte long hash regardless of the amount of data that is processed. A good hashing algorithm, like SHA-1, will produce significantly different hash values even for minute changes in the source data, or binary file in this case.
According to cryptographic number theory, there is essentially no way to predict what changes to a file could be made while still producing the same hash value. Therefore, it is infeasible to make a modification to a file to insert malicious changes and maintain the same hash of the modified file. Therefore, the hash of a file can be compared to a stored hash in order to validate that no modifications have been made. In order to prevent an adversary from changing the stored pre-computed hash as well as the module being validated, the validity of the stored hash must be verifiable. For example, the table of stored hashes may be signed by a trusted signatory.
However, many software modules use functionality in other software modules known as dynamic link libraries or DLL's. In order to run, some software modules which reference functions contained in other software modules include an import address table (IAT). The IAT is a table of addresses for functions that are imported by a module. The “on-disk” initial values of the import address table (IAT) are updated by the operating system (OS) loader once the module is loaded into memory and function addresses are resolved against DLL export tables to point to the locations of functions in other modules.
Thus, dynamic linking of external DLL's is implemented through the IAT. This process is referred to at “binding”. References to functions implicitly linked in the software module are specified in the module's import data. At load-time, the operating system loader refers to the import table to determine which external functions in which DLL's must be bound. These references to external functions are centralized in the IAT so that binding is efficient. The table of addresses that is the IAT is used to provide a level of indirection between calls to external functions within a module and the external function call sites in other modules. That is, the IAT represents a single point of modification for the loader for all external references from a module. In other words, the OS loader must only update the addresses in the IAT instead of modifying every reference to each imported function spread throughout the software module, which can be several references for each imported function. At load time, the loader will determine where each imported function is located (e.g. a DLL file), load into memory the file, if necessary, containing the function, compute the location of the external function in the file, and place the address for the function in the IAT of the calling module. The file containing the external function contains an export table which is consulted by the loader to determine the proper address for the external function.
Because the IAT changes at load-time to include the actual addresses for functions which will be needed by the software module, and the locations of those functions can only be determined at run-time after the containing modules have been loaded into memory, the in-memory IAT can not be authenticated by trivial comparison with the version of the IAT in the on-disk image. However, because the IAT is not authenticated, then an adversary may “detour” calls to external functions, exposing potentially sensitive data to an attacker. For example, a software module SM may call to function A. The OS loader loads the DLL file containing function A and inserts the proper address for function A into the IAT of SM. An adversary may write a function FAKEA, which calls function A by passing the data from the call from software module SM to function A and returns any returned data from function A to the software module SM. At this point, the adversary has the ability to examine all data both passed from and returned to function A. In addition, the adversary has the ability to manipulate that data in order to possibly change the behavior of the function call, leading to unintended program behavior. This might, for example, be used to subvert security measures and access checks. This is known as “IAT detouring” and when used as a reverse engineering technique is in some cases referred to as a “man-in-the-middle” attack.
Several well-known and publicly available software programs, such as the “Detours” program, provide simplified mechanisms for program instrumentation through IAT modification. In addition, there are other programs that also perform such IAT modification for altruistic purposes. For example, virus checkers and accessibility tools may make use of IAT detouring to insert themselves into the code path of important function calls. Thus, in the case of a virus scan, a call to a function may allow the file containing the function to be virus scanned. In the case of an accessibility tool, data to be displayed on a monitor may be enlarged or otherwise used in order to provide greater accessibility.
Obviously, however, IAT detouring can expose sensitive information and jeopardize the security of sensitive code. In cases, such as in the digital rights management context, it may be important to either prohibit such IAT detouring or limit the functions which can perform it to authorized modules.
In addition to the IAT, a delay load IAT may be present. Such a table performs a function similar to the IAT's function by storing addresses for imported functions. However, binding for the delay load IAT occurs only when an imported function is called for the first time by a module. This late binding can sometimes result in a performance improvement when loading the module because the binding process is bypassed. Thus, the cost of binding an imported function is only incurred if the function is actually called by the application module. The performance benefit is most noticeable in cases where many functions are imported but few are actually used during a particular session of the process. For example, the cost of binding spell checking functions in a word processor program is unnecessary if the spell checker is never used during a particular word processing session. The delay load IAT is subject to the same possible detouring described with reference to the IAT.
In view of the foregoing, there is a need for a system that overcomes the drawbacks of the prior art.