The “meltdown” hardware security vulnerability enables unprivileged processes to read inaccessible kernel memory by exploiting speculative execution. Generally, a malicious user can trick the CPU into speculatively accessing u[j], such that u is a user array and j is some private data of the kernel that is unknown to the user. Because u[j] is now cached in user-space, the user can deduce the value of j by timing access to u's elements. The vulnerability affects microprocessors from Intel, IBM, and ARM released over the last two decades. Fixing the vulnerability will have a cost in terms of real dollars that may eclipse the Y2K bug. Unlike Y2K, fixing meltdown will have a lasting performance impact, as patching it requires establishing barriers to speculation and isolating the kernel.
Meltdown is made possible because operating systems traditionally map the kernel's address space into the process page tables of every process for efficiency; in other words, the virtual address space of each process includes the user address space for that process and the kernel address space. System designers rely on hardware protection to prevent unauthorized user access by marking the kernel memory pages as privileged. Unfortunately, on meltdown-vulnerable CPUs, a user process can speculatively access these privileged kernel pages, thereby leaking kernel data indirectly. With instruction pipelining, for example, data from an unauthorized address can be temporarily loaded into the CPU's cache during out-of-order execution. This cache presents a side-channel attack opportunity that allows an unprivileged process to bypass the normal privilege checks that isolate that process from accessing data belonging to the operating system. As a consequence, the unprivileged process can read data from any address that is mapped to the current process' virtual address space, including the kernel's address space.
The canonical defense against meltdown recommended by CPU vendors is to separate the kernel and user into two different address spaces. This technique, known as “page table isolation” (PTI), is employed in various operating systems including BSD, Linux, OS X, and Windows. Whereas current systems have a single set of process page tables for each process, PTI uses implements two sets of process page tables. One set is essentially unchanged; it includes both kernel-space and user-space addresses, but it is only used when the system is running in kernel mode. The second set contains a copy of all of the user-space mappings, but leaves out much the kernel side. Instead, there is a minimal set of kernel-space mappings that provides the information needed to handle system calls and interrupts, but no more. Whenever a process is running in user mode, the second set of process page tables will be active. The bulk of the kernel's address space will thus be completely hidden from the process, defeating the known hardware-based attacks. Whenever the system needs to switch to kernel mode, in response to a system call, an exception, or an interrupt, for example, a switch to the first set of process page tables will be used.
PTI has been shown to reduce the performance of some workloads by as much as 30% or more. Especially affected are workloads that frequently make system calls into the kernel and must therefore suffer PTI overhead associated with context switching. Presumably, meltdown could be fixed in future processors, potentially without a performance penalty. But it would be impractical if not impossible to fix the billions of processors already in service due to the hardware nature of the vulnerability. The situation is especially dire for embedded, real-time applications which use meltdown-vulnerable processors, such as avionics, railway controls, medical, industrial control, and other time-sensitive systems. These safety-critical systems may have been deployed with the expectation that the processor would operate in a fixed performance envelope, an assumption which may no longer hold if PTI is enabled for those systems.