1. Field of the Invention
This invention relates to a method of protecting computer program code; it can, for example, be used to defeat an attack, such as the ‘MMU attack’, in which code and data fetches to the same logical address are routed to different physical addresses.
2. Description of the Prior Art
In the field of software anti-tampering, a technique called “self-checking” is often used to ensure that a program's code has not been modified in any way. Self-checking systems typically perform runtime checksums (or hashes) on the code in memory and compare the results to the expected checksums, as obtained from an unmodified version of the program. A checksum or hash is an instance of a ‘digest’. So, the generic case is where a self-checking system generates a digest of the code in memory and compares the result to the unmodified version of the code. A digest is a (typically small) representation of a set of data where small changes in the data cause large changes in the digest. A typical digest is between 32 and 2048 bits in size. FIG. 1 shows a self-checking system, with code checking other code.
Although self-checking can be quite effective, a recent technique, referred to as a MMU (Memory Management Unit) Attack or a TLB (Translation Lookaside Buffer) Attack, shows how existing self-checking systems can be completely defeated by fooling them into checking an unmodified copy of the program's code while the computer actually executes a modified (i.e. hacked) copy. This attack uses features of most processors (i.e. Central Processing Units (CPUs), graphics processing units (GPUs) etc.) which allow different types of memory access to be handled differently, such that data and code accesses can be distinguished from each other and re-routed to alternative physical addresses, even though they share the same logical address. This feature is usually implemented in a Memory Management Unit (MMU). However, this feature may also be implemented in systems that are equivalent to MMUs. For example, some CPUs might have simpler circuitry that is not referred to as an MMU, but which provides the necessary features we describe. MMUs can also exist as co-processors in some systems, usually in older CPUs.
Processors can distinguish between an access (also known as a ‘fetch’) of data and an access or fetch of code. Further details are given in Appendix 1. Self-checking systems are vulnerable because they run their checksums using a data fetch and not a code fetch. Note that a code fetch can only fetch code in the process of executing it, so there is no way to construct a self-checking mechanism from code fetches—they must always be data fetches. The MMU attack exploits this vulnerability by generating an unhacked, unmodified version of the code in physical locations associated with a data fetch. Hence, when the self-checking system runs its checksums, the system performs its checksum or checksums on the unhacked version of the code. But the code that executes will be the hacked code—this is accessed solely using code fetches, and not data fetches. The self-checking system is hence completely unaware of the presence of this parallel, executable version of the code, which has been modified by the hacker. The attack could also be done the other way round in some cases, i.e. re-route the code fetches and leave data fetches alone. In fact, both could be re-routed.
The easiest way to implement the MMU attack is per the first description, i.e. to make an unmodified copy that will be checked, and modify (hack) the original code. This is due to other difficulties with moving the code that is to be executed. However, these difficulties can be overcome somewhat, and do not even exist on certain programs and/or CPUs, so it is feasible to modify the copy and execute it, leaving the original as the untouched version which will be checked.
In cases where either version could be a copy, it is also feasible for both to be copies, i.e. the original is no longer used, and we have an unmodified copy (to be checked) and a modified (hacked) copy to be executed.
To re-cap, using the CPU (or other form of computational unit) feature of separate handling of code fetches and data fetches, the MMU attack can be implemented simply by re-routing data fetches to another memory location which contains an unmodified copy of the program's code. Since a self-checking system uses data fetches to perform checksums on the code to detect tampering, the code being checked will be this unmodified copy and not the code actually being executed. This is because the code being executed uses code fetches that will be routed to the modified (hacked) version of the code. FIG. 2 shows the re-routing of data fetches to thwart a self-checking system.
Although the concept of the MMU Attack is relatively simple, it is non-trivial to implement in practice since existing Operating Systems (OSes) do not typically allow the MMU to be configured in such a manner. As a result, this type of attack usually requires a kernel modification—this imposes a deployment burden on the end-user, which may limit the usefulness of this type of attack in some contexts. However, the rise in popularity of Machine Virtualisation stands to reduce this deployment burden, since end-users can deploy the attack on a guest OS without polluting their host OS. The host OS is the primary OS running on a machine, although this line is being blurred with the introduction of “nested” virtualisation. The guest OS is an OS running under virtualisation.
Machine Virtualisation is often referred to as Platform Virtualisation and there are several ways of implementing this, including “Full Virtualisation”, “Hardware-Assisted Virtualisation” and “Para-virtualisation”. They typically differ in terms of the level at which they virtualise a machine. For the purposes of the present document, however, the differences between these different ways of implementing Machine Virtualisation are not particularly significant. This is because the present document deals with a hardware feature which exists at a level in the system architecture below all these ways of implementing Machine Virtualisation.
A major problem with this type of attack is that it is likely to be impossible for an anti-tampering system to directly detect that the MMU has been configured in the manner described above. Even if it were possible, a hacker will typically be able to disguise or spoof this configuration on a per-case basis such that any specific anti-tampering system will not be able to detect it. By “directly detect”, we mean querying the state of the MMU in order to determine that the memory pages used by the program code are set up to route data and code access to different physical locations. This might involve using an OS application programming interface (API) to obtain MMU access, and/or use of CPU registers, as well as accessing any in-memory data structures used to implement page-mapping by the MMU. Since this type of mapping is not generally supported by most OSes, it is likely that no OS API exists which provides the required information for the program to determine that the MMU Attack is present.
Overall, the widely held view is that no defence is possible against the MMU Attack. Reference may be made to G. Wurster, P. C. van Oorschot, A. Somayaji: “A generic attack on checksumming-based software tamper resistance”. Technical Report TR-04-09, Carleton University, November 2004; see also G. Wurster, P. C. van Oorschot and A. Somayaji: “A generic attack on checksumming-based software tamper resistance—Slides” In IEEE Symposium on Security and Privacy, May 2005. pp 127-138; and also P. C. van Oorschot, A. Somayaji and G. Wurster: “Hardware-assisted circumvention of self-hashing software tamper resistance” IEEE Transactions on Dependable and Secure Computing, April-June, 2005.