Many computer systems routinely process sensitive data whose confidentiality and integrity needs to be protected from various security risks. Although the protection of such sensitive data can be enhanced by preventing unauthorized physical access and connection to malicious input/output (“I/O devices”), such protection is particularly challenging in cloud-computing environments, where users do not have physical control over the hardware that executes their workloads.
A software cryptoprocessor employs cryptographic techniques to provide confidentiality and integrity for an entire system, including both user-mode applications and privileged-mode system software, such as a hypervisor or an operating system. One software cryptoprocessor is described in U.S. Patent Publication No. US 20130067245 A1, entitled “Software Cryptoprocessor,” which is hereby incorporated by reference. In such a software cryptoprocessor, only the main processor needs to be trusted to operate according to its specifications; other system hardware is considered untrusted and potentially malicious. A software cryptoprocessor ensures that data (including code) is available as cleartext only within trusted portions of the processor cache but remains encrypted in main memory. To ensure the data is available in cleartext only within trusted portions of the processor cache, a software cryptoprocessor encrypts data to be stored in main memory before the data leaves the central processing unit (“CPU”) package and decrypts data loaded from main memory after being loaded into the CPU package.
A software cryptoprocessor may use techniques, such as encrypted demand paging, to transfer data securely between main memory and the processor cache in a manner that is transparent to applications. In effect, the software cryptoprocessor treats the processor cache like main memory in a conventional system and treats main memory like a conventional backing store on disk or other secondary storage. Such techniques, however, may result in degraded performance as a result of increased memory pressure, due to the relatively small amount of cache serving as “main memory.” For example, although a modern Intel x86 processor contains tens of megabytes of cache, typical systems using such processors are configured with tens of gigabytes of RAM used as main memory, that is, roughly a thousand times larger than the cache.
Although a software cryptoprocessor does not need to trust hardware devices other than the CPU, it does need to allow access to I/O devices such as those commonly used for storage and networking. Such devices typically employ direct memory access (“DMA”) to transfer data between the I/O device and main memory efficiently, without involving the main processor. Since DMA occurs at a hardware level, system software can mediate only at the start or end of a DMA but cannot mediate while the access is in progress.
Unfortunately, the inability to mediate during DMA presents problems for a software cryptoprocessor. A malicious or malfunctioning I/O device could directly access cleartext data associated with the software cryptoprocessor and its applications, making it possible to violate both confidentiality and integrity by reading or tampering with sensitive data. In a software cryptoprocessor, only data resident in the cache can be trusted, and all main memory is considered untrusted. Some software cryptoprocessors may allow portions of the cache itself to be untrusted.
Since a software cryptoprocessor does not encrypt and decrypt data of a DMA by an I/O device, such a DMA should be to main memory containing cleartext data from the perspective of the software cryptoprocessor. The data may, however, be encrypted or transformed independently by other system components, such as by software prior to issuing I/O writes that are ultimately transferred to a device via DMA. For example, many storage subsystems implement file-level or block-level encryption, and network subsystems commonly implement secure protocols such as SSL and IPsec.
The use of untrusted devices also exposes software to time of “check to time of use” (“TOCTOU”) attacks, where data, such as a security credential, is changed after it has been checked but before it is used. For example, a malicious device that is able to write to main memory via DMA can modify data while the data is being processed by software. This enables the device to exploit a race condition between the time that software has finished validating data and the time that data is used throughout the system. The device can inject malicious contents that would have otherwise failed verification. To avoid such an attack, a software cryptoprocessor first copies untrusted device data into an area of memory that is not accessible to untrusted devices before the device data is validated.
Some systems, including Intel x86 platforms, implement cache-coherent DMA. Typically, cache coherency is achieved by having hardware snoop for a DMA on the memory interconnect, and corresponding cache lines are typically invalidated or evicted to main memory. However, recent performance optimizations, such as Intel Data Direct I/O Technology (“DDIO”), allow portions of memory to be read from or written to the cache directly by a device. DMA can utilize a certain portion of the cache, storing device data directly into the cache, without first going through main memory. For example, on platforms using the Intel x86 Sandy Bridge EP processor, DDIO may allocate up to 10% of the last level cache (“LLC”). In such systems, the main memory and cache controllers may evict other resident cache lines to make room for the data of a DMA write. These systems may pessimistically evict lines from the cache, even when the main memory being written by a DMA write is marked uncacheable. For example, such pessimistic behavior has been observed on the Intel x86 Sandy Bridge EP platform, with a DMA write from a device to main memory marked uncacheable (“UC”) in the appropriate memory type range registers (“MTRRs”). Such evictions represent a potential security risk for a software cryptoprocessor, which relies on keeping cleartext sensitive data in the cache and encrypting the sensitive data before it leaves the CPU package.