The increasing complexity of computer systems makes them vulnerable to a variety of attacks. The dominant current approach to deal with such malware is to compare execution patterns with malware patterns (i.e., malware signatures). Unfortunately, this has led to an “arms race,” where malware developers flood the defenders with polymorphic and ever-changing malware.
Current commodity operating systems and the majority of applications lack assurance that the secrecy and integrity of security-sensitive code and data remain intact. Absent is any guarantee of code integrity, which implies that users, administrators, and other systems must blindly trust that a given system platform or application protects its sensitive data—trust that is all too often misplaced.
Three fundamental causes of this problem are: (1) increased size and complexity of commodity operating systems over time, which effectively eliminates the possibility of precise, formal verification of the entire code base, (2) retention of object code compatibility of operating system and application APIs, which implies their immutability in time despite the presence of demonstrable design flaws, and (3) new business models for developing operating systems and applications rely extensively on mixed-provenance code bases; i.e., code provided by different development organizations with different business interests. This rules out global guarantees of security properties since a single point of control and responsibility over a platform's code base does not exist any longer.
A significant source of increased size and complexity is support for “plug and play” code that changes system configuration by adding new devices and system administrative functions to the base operating system. Undoubtedly operating system ease of extensibility is a fundamental requirement for innovation. However, complexity is often further increased by removing inter-module and inter-layer boundaries to enhance system performance for these applications. Unfortunately, verification of the code base and the analysis of its penetration resistance cannot be effectively performed for code bases that are almost continuously extended in time with their modules reaching sizes that exceed one million lines of monolithic code. It is not just the Windows code base that was extended over the past 20 years: in the mid 1980s UNIX for PCs had 50 K SLOC in the Kernel (and 120 K in security-relevant components); today it has over 1 M SLOC.
Retention of object-code compatibility for applications suggests that design flaws cannot be readily eliminated without breaking an execution environment. In time, lack of object code compatibility tends to destroy the very market base sought by a system provider, such as whenever users of that environment have to recompile their code with a new set of APIs. Often users cannot perform recompilation since very seldom do they own the source code of their applications. In effect, if flawed APIs become immutable, “compatibility with previous mistakes” becomes a pervasive challenge to building trusted systems. Numerous examples of conflicts between API definitions and security properties exist, ranging from APIs that have built-in covert channels that would otherwise not exist [National Computer Security Center. A guide to understanding covert channel analysis of trusted systems. Technical Report NCSC-TG-030 Version 1, National Computer Security Center, November 1993.], to APIs that enable outright penetration of an operating system [M. Howard and D. LeBlanc. Writing Secure Code: Second Edition. Microsoft Press, 2003.], and to cryptographic APIs that enable discovery of secrets [M. D. Bond. Diagnosing and Tolerating Bugs in Deployed Systems. PhD thesis, University of Cambridge, 2008.]. The challenge of how to remove or neutralize API flaws without introducing object-code incompatibility for extant applications has been a subject of intense research (viz., over half-a-dozen workshops in the API security area).
Mixed-provenance code within an operating system or application means that code analysis, verification, and unit testing of the resulting system cannot possibly be performed, since no single party would own, and have access to, all the system source code. A code provider and a code user have different, often conflicting business interests: the code provider does not have any incentive to address the verification concerns of the code user since the provider cannot be expected to satisfy global system properties; whereas the code user may know all global system properties but has no access to the many providers' code bases to verify them. Thus, externally provided modules could only be monolithically API-tested for coarse, global properties by the code user—hardly a reassuring feature for a computing base upon which security-sensitive applications are expected to rely. The inner working of at least some NVIDIA drivers are a mystery to Microsoft while Microsoft's higher-level security concerns are far removed from NVIDIA's design priorities. Yet those drivers affect the overall system security as much as any other code. Similar security dilemmas appear at the application level; e.g., in the financial and banking applications it is estimated that 70% of all software is of mixed provenance.
These three fundamental obstacles to the development of trusted commodity operating systems and applications suggest that we will not achieve the level of assurance necessary to run security-sensitive code and data on these platforms in the near future. Yet, commodity operating systems and applications offer unmatched incentives for use by both casual users and developers, and hence will remain a dominant presence in the marketplace. First, commodity systems have become and will continue to be the major common platforms for innovation. Hence, the latest technology advances are likely to occur on these platforms and thus they can hardly be set aside. Second, they provide a rich development environment by offering powerful application and device support. Third, they combine productivity software (e.g., office, web, mail) with entertainment (e.g., games, socialization) in a marketplace that values consolidation of computing and communication services.
Challenge 1.
The major challenge we face is not to develop new secure operating system platforms, though this remains a worthy long-term goal. Nor is it to eliminate all software flaws from an existing platform, though this too remains a worthy, if somewhat elusive, goal. Instead, the major challenge is to develop system-level techniques that enable users to run applications containing security-sensitive code and data on untrusted commodity platforms, which may be plagued by malware (e.g., rootkits, Trojan Horse programs, software key-loggers, screen-scrapers), and yet provide strong, user-verifiable assurances of secrecy and integrity selectively for the applications' security-sensitive code and data. That is, a developer should be able to specify precisely and select the security-sensitive code and data of an application, and provide guarantees, which can be verified by a user external to the untrusted platform, that the desired security properties of the selected code and data are maintained even in the presence of malware. Moreover, the verification of these guarantees should be available at any time, not just at system boot, and should be easy to perform by a casual user. Finally, the mechanisms that provide these capabilities should not impose significant, or even user-perceptible, performance degradation.
Security Properties.
A second challenge that arises in running security-sensitive code and maintaining the data of an application on an untrusted platform is that of specifying what security properties can be supported. Clearly, some security properties cannot be supported either in theory or in practice. For instance, certain code-obfuscation models that do not take advantage of any underlying system security features (i.e., are implementation-independent) can be theoretically ruled out [B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, S. Vadhan, and K. Yang. On the (Im)possibility of obfuscating programs. In Advances in Cryptology (CRYPTO), August 1998.]. Other properties, such as those of noninterference and other information-flow control policies (e.g., elimination or control of covert channel use), may be theoretically viable but either impractical in or insignificant to an application.
There is a need to support a very large class of security properties that fall into the class of “safety” properties [S. Gupta and V. Gligor. Towards a theory of penetration-resistant systems and its applications. In Proceedings of the Computer Security Foundations Workshop, June 1991.], [S. Gupta and V. Gligor. Experience with a penetration analysis method and tool. In Proceedings of the National Computer Security Conference, October 1992.], [V. Nguyen, D. Gries, and S. Owicki. A model and temporal proof system for networks of processes. In Proceedings of the ACM Symposium on Principles of Programming Languages (POPL), pages 121-131, 1985.], [S. Owicki and L. Lamport. Proving liveness properties of concurrent programs. ACM Trans. Program. Lang. Syst., 4(3):455-495, 1982.]. There are three important reasons for this. First, most security polices can be expressed as safety properties [F. B. Schneider. Enforceable security policies. ACM Trans. Inf. Syst. Secur., 3(1):30-50, 2000.], and while information-flow policies are a notable exception [J. McLean. A general theory of composition for a class of “possibilistic” properties. IEEE Trans. Softw. Eng., 22(1):53-67, 1996.], most of these policies can be approximated as safety properties; e.g., all mandatory access controls implemented to date are such approximations. Second, most penetration-resistance properties can also be expressed as safety properties, since they are typically represented via state-transition models [M. Bishop. Computer Security Art and Science. Addison Wesley, 2003.]. Finally, the only “liveness” properties that concern the selected sections of application code whose security needs to be protected are automatically converted into “safety” properties in our system. That is, the protected execution of these sections is bracketed by “timeouts” so that termination of their execution is always guaranteed [F. B. Schneider. Enforceable security policies. ACM Trans. Inf. Syst. Secur., 3(1):30-50, 2000.].
Challenge 2.
Another core challenge is that any security system that is added needs to be compatible with current applications and IT workflows. For example, end-users do not want to abandon their legacy software and OS to switch to a new system to gain better security. Thus, we need to ensure that our mechanisms are compatible with current operating environments and applications.
Related Research in Software-Based Attestation.
Genuinity is a technique that explores the problem of detecting the difference between a simulator-based computer system and an actual computer system [R. Kennell and L. H. Jamieson. Establishing the genuinity of remote computer systems. In Proceedings of the USENIX Security Symposium, 2004.]. Genuinity relies on the premise that simulator-based program execution is bound to be slower because a simulator has to simulate the CPU architectural state in software, in addition to simulating the program execution. A special checksum function computes a checksum over memory, while incorporating different elements of the architectural state into the checksum. By the above premise, the checksum function should run slower in a simulator than on an actual CPU. While this statement is probably true when the simulator runs on an architecturally different CPU than the one it is simulating, an adversary having an architecturally similar CPU can compute the Genuinity checksum within the allotted time while maintaining all the necessary architectural state in software. As an example, in their implementation on the x86, Kennell and Jamieson [R. Kennell and L. H. Jamieson. Establishing the genuinity of remote computer systems. In Proceedings of the USENIX Security Symposium, 2004.] propose to use special registers, called Model Specific Registers (MSR), that hold various pieces of the architectural state like the cache and TLB miss count. The MSRs can only be read and written using the special RDMSR and WRMSR instructions. We found that these instructions have a long latency (approximately 300 cycles). An adversary that has an x86 CPU could simulate the MSRs in software and still compute the Genuinity checksum within the allotted time, even if the CPU has a lower clock speed than what the adversary claims. Also, researchers have documented weaknesses in the Genuinity approach [U. Shankar, M. Chew, and J. Tygar. Side effects are not sufficient to authenticate software. In Proceedings of the 13th USENIX Security Symposium, 2004.].
Related Research in Hardware-Based Attestation.
The Integrity Measurement Architecture (IMA) is a SRTM-based technique that relies on the TPM chip standardized by the Trusted Computing Group [R. Sailer, X. Zhang, T. Jaeger, and L. van Doom. Design and implementation of a TCG-based integrity measurement architecture. In Proceedings of the USENIX Security Symposium, August 2004.]. That technique enables a remote verifier to verify what software was loaded into the memory of a platform. However, a malicious peripheral could overwrite code that was just loaded into memory with a DMA-write, thereby breaking the load-time attestation guarantee. The Terra system uses a Trusted Virtual Machine Monitor (TVMM) to partition a tamper-resistant hardware platform into multiple virtual machines (VM) that are isolated from each other [T. Garfinkel, B. Pfaff, J. Chow, M. Rosenblum, and D. Boneh. Terra: A virtual machine-based platform for trusted computing. In Proceedings of the Symposium on Operating System Principles, 2003.]. CPU-based virtualization and protection are used to isolate the TVMM from the VMs and the VMs from each other. Although the authors only discuss load-time attestation using a TPM, Terra is capable of performing run-time attestation on the software stack of any of the VMs by asking the TVMM to take integrity measurements at any time. All the properties provided by Terra are based on the assumption that the TVMM is uncompromised when it is started and that it cannot be compromised subsequently. Terra uses the load-time attestation property provided by TCG to guarantee that the TVMM is uncompromised at start-up. Since this property of TCG is compromised, none of the properties of Terra hold. Even if TCG were capable of providing the load-time attestation property, the TVMM could be compromised at run-time if there are vulnerabilities in its code. The Copilot approach relies on an add-in card connected to the PCI bus to perform periodic integrity measurements of the in-memory Linux kernel image [N. L. Petroni, Jr., T. Fraser, J. Molina, and W. A. Arbaugh. Copilot—a coprocessor-based kernel runtime integrity monitor. In Proceedings of the USENIX Security Symposium, 2004.]. These measurements are sent to the trusted verifier through a dedicated side channel. The verifier uses the measurements to detect unauthorized modifications to the kernel memory image. The Copilot PCI card cannot access CPU-based state such as the pointer to the page table and pointers to interrupt and exception handlers. Without access to such CPU state, it is impossible for the PCI card to determine exactly what resides in the memory region that the card measures. The adversary can exploit this lack of knowledge to hide malicious code from the PCI card. For instance, the PCI card assumes that the Linux kernel code begins at virtual address 0xc0000000, since it does not have access to the CPU register that holds the pointer to the page tables. While this assumption is generally true on 32-bit systems based on the Intel x86 processor, the adversary can place a correct kernel image starting at address 0xc0000000 while in fact running a malicious kernel from another memory location. (The authors of Copilot are aware of this attack.) It is not possible to prevent this attack without access to the CPU state.
The Cerium [B. Chen and R. Morris. Certifying program execution with secure procesors. In Proceedings of HotOS, 2003.] approach uses hardware extensions to the execution platform to provide a remote host with the guarantee of verifiable code execution. Cerium relies on a physically tamper-resistant CPU with an embedded public-private key pair and a micro-kernel that runs from the CPU cache. Unfortunately, Cerium remains a paper design and was never built. The BIND system [E. Shi, A. Perrig, and L. van Doom. BIND: A time-of-use attestation service for secure distributed systems. In Proceedings of IEEE Symposium on Security and Privacy, May 2005.] requires that the execution platform provides support for DRTM and was designed for an early version of AMD Secure Virtual Machine (SVM) processors. The Open Secure Loader (OSLO) [B. Kauer. OSLO: Improving the security of Trusted Computing. In Proceedings of the USENIX Security Symposium, August 2007.] employs the AMD SVM SKINIT instruction to eliminate the BIOS and boot loader from the TCB and establish a DRTM for trusted boot. The Flicker system provides an approach for secure execution and externally-verifiable code execution that relies on DRTM mechanisms offered by modern AMD and Intel processors [J. M. McCune, B. Parno, A. Perrig, M. K. Reiter, and H. Isozaki. Flicker: An execution infrastructure for TCB minimization. In Proceedings of the ACM European Conference in Computer Systems (EuroSys), April 2008.].
Accordingly, there is a need for improved security for sensitive code and data, particularly for improvements that are easy to use. Those and other advantages of the present invention will be described in more detail hereinbelow.