A trusted display service provides a protected channel that assures the confidentiality and authenticity of content output on selected screen areas. With it, the primitive users can rely on the information output by a security-sensitive application (SecApp) without worrying about undetectable screen “scrapping”, where the display output is surreptitiously read, or “painting”, where the display output is surreptitiously modified by malicious software on computing systems, by a compromised operating systems (OSes) or unsecured applications (Apps).
Security architectures that isolate entire SecApps from untrusted OSes and unsecured Applications (Apps) implement trusted display functions via a trusted path. That is, a primitive user's explicit activation of the trusted-path effectively removes all untrusted OS and Apps access to the display device (e.g. video cards) and assigns the device to a SecApp for the entire duration of a session. Unfortunately, the exclusive use of display devices via trusted path does not allow both untrusted OS/Apps and SecApps to output content concurrently on a user's screen. The untrusted output cannot be displayed until after the trusted path releases the screen at the end of the SecApp session. As a consequence, it would not be possible to maintain the typical multi-window user experience for applications that comprise both trusted and untrusted components and use the same display screen.
Some past approaches that allow trusted display of output with different sensitivity on the same screen concurrently have been based on encapsulating and protecting graphics cards within high-assurance security kernels. In addition to requiring changes to the OSes, adopting such an approach for the entire graphics processing unit (GPU) of a video card would not work because the complexity of modern GPU functionality (e.g. 2D/3D hardware rendering, general-purpose computing on GPU (GPGPU), and hardware video encoding/decoding) rules out maintaining a small and simple code base for the security kernel, which is a prerequisite for high assurance. For example, the size of Intel's GPU driver for Linux 3.2.0-36.57 has over 57K SLoC, which is more than twice the size of a typical security kernel. Furthermore, GPU functions operate asynchronously from the Central Processor Units (CPUs) to improve graphics performance and introduce concurrency control for multi-threading in the trusted code base. This would invalidate all correctness proofs that assume single-thread operation.
Full GPU virtualization can be used to enable concurrent display of both trusted and untrusted output on a user's screen without requiring OSes/Apps modification. However, full GPU virtualization, which is largely motivated by improved performance, relies on address-space sharing between different virtual machines (VMs) and the GPU without providing adequate hardware mechanisms for protecting different VMs' code and data within the GPU. Moreover, full GPU virtualization intrinsically requires a large trusted code base; e.g. supporting native GPU drivers/Apps requires emulating all accesses to all GPU configuration registers for the VMs scheduled to access the GPU. Thus, adopting full GPU virtualization for high-assurance trusted display would be impractical.
Unless explicitly mentioned/differentiated, the present invention will use the term “GPU” to refer to both video card and graphic processing unit, as the graphic processing units are the major components in modern video cards.
CPU programs (e.g. GPU drivers and Apps) 100 control GPU execution via five types of objects (also known as programming objects), namely data 108, page tables, commands 106, and instructions 104 that are stored in GPU memory (including GPU device memory and main memory referenced by GPU address spaces 102), and GPU configuration registers 110 as shown in FIG. 1.
CPU programs 100 produce the instructions and commands that are executed by GPU hardware. For example, instructions are executed on GPU processor cores, process input data, and produce results that are used by display engines. In contrast, commands 104 are executed by dedicated command processors and are used to configure the GPU with correct parameters; e.g., specify stack base address used by instructions. Groups of commands 104 are submitted for processing in dedicated command buffers; e.g., they are received in input (ring) buffers from drivers and (batch) buffers from both applications and drivers.
As shown in FIG. 1, a GPU 130 also contains several engines, such as the processing engine 118 and display engine 116, as well as other engines 120. The processing engine 118 executes instructions on multiple GPU cores for computation acceleration. It references memory regions known as the GPU local address space via the GPU local page tables 114. The display engine 116 parses screen pixel data stored in frame buffers according to the engine's configurations, and outputs images for display. Other engine 120 perform a variety of functions such as device-wide performance monitoring and power management.
The display engine 116 defines several basic configurations for frame buffer presentation; e.g. geometry and pixel formats. Furthermore, it provides the data paths from frame buffers to external monitors 140. For example, the screen output may comprise a combination of multiple screen layers, each of which contains a separate frame buffer. In this case, GPUs support a hardware cursor as the front layer of the screen and display it over the primary image. Since a single GPU 130 may be connected to multiple screen monitors, a monitor 140 may consume the same frame buffers as another monitor 140, which implies that GPU memory protection requires a controlled sharing mechanism. Furthermore, an image presented on a screen may be torn as the result of frame-buffer updates by CPU programs during screen refreshing. To address this synchronization problem, display engines 116 of modern GPUs 130 also provide a V-Sync interrupt to notify CPU programs 100 of the time when it is safe to update a frame buffer.
Although the GPU architecture illustrated in FIG. 1 is common to many commodity GPUs, some of these GPUs differ in how memory is accessed and managed. For example, Intel's GPUs use a global page table (GGTT) for memory access in addition to local page tables. The GGTT maps the memory region referred as the GPU global address space, which includes frame buffers, command buffers, and GPU memory aperture, which is shared between CPU and GPU. In contrast, AMD and Nvidia GPUs do not have a GGTT and allow direct access to GPU physical memory address space (we consider that these GPUs use a GGTT with flat mappings (e.g. virtual addresses are identical with physical addresses) even though the GGTT does not exist in these GPUs). This implies that GPU memory access may also differ in different GPUs; e.g., the processing engine of Nvidia's GPU can access only the local address space, whereas the Intel's and AMD's can also access the global address space.
It should be noted that (1) a GPU may not provide GPU instructions; (2) a GPU may only comprise processing engines and display engines without any other engines; and (3) a GPU may not have a GGTT as described above. These differences do not contribute a different GPU model, because this present invention covers a superset of these cases.
Implementing a trusted display service on untrusted OS and hardware platforms that support SecApp isolation faces three basic challenges.
Incompatibility with Computing Platforms.
The goal of maintaining object-code compatibility with untrusted OSes (not designed to tamper trusted display) that directly access GPU objects in an unrestricted manner poses a dilemma. If one re-designs and reimplements GPU functions on OSes to block memory accesses that breach address space separation, one introduces object-code incompatibility. If one does not, one forgoes trusted display. To retain compatibility, access to GPU objects by untrusted OS/Apps code must be emulated by the trusted system, which increases the trusted code base and makes high-assurance design impractical.
Inadequate GPU Hardware Protection.
The inadequacy of the hardware for memory protection is well known for Intel GPUs. An address-space separation attack by malicious GPU instructions illustrates another instance of this problem and suggests that simplistic software solutions will not work. For example, verifying address offsets of GPU instructions before execution does not work because operand addressing cannot always be unambiguously determined due to indirect branches and register-indirect memory accesses.
Unverifiable Code Base.
Even if, hypothetically, all the OS/Apps functions that access GPU objects could be isolated and made tamper-proof, their code base would be neither small (i.e., tens of thousands of SLoC) nor simple, and hence the formal verification of their security properties would be impractical. A large number of diverse GPU instructions and commands spread throughout different drivers and application code provide access to a large number of GPU objects; e.g., a GPU can have 625 configuration registers and 335 GPU commands. Furthermore, since the underlying trusted base (e.g., micro-kernel or micro-hypervisor) must protect different SecApps on a computing platform, the functions that access GPU objects directly must be implemented within the trusted base. Hence, these functions' code would have to preserve all existing assurance of the underlying trusted base; i.e., their security properties and proofs must compose with those of the trusted base. These challenges have not been met to date.