Emerging fast, non-volatile memories (e.g., phase change memories, spin-torque MRAMs, and the memristor) reduce storage access latencies by an order of magnitude compared to state-of-the-art flash-base d solid-state disks/drives (SSDs). These technologies will rewrite the rules governing how storage hardware and software interact to determine overall storage system performance. As a result, software overheads that used to have little impact on storage performance will become critical to fully exploiting the performance that these memories can offer. In particular, software overheads that used to contribute marginally to latency (because storage hardware was slow) will potentially squander the performance that these new memories can provide.
Recent work describing Moneta, a fast, next-generation storage architecture, has shown that optimizing the existing IO stack and tuning the hardware/software interface can reduce software overheads by up to 62% and increase sustained bandwidth for small accesses by up to 19×. However, even with these reduced overheads, IO processing places large demands on a system's compute resources, i.e., sustaining peak performance on Moneta for 4 KB requests requires the dedicated attention of 9 Nehalem thread contexts. Entering the kernel, performing file system checks, and returning to user space account for 30% (8 μs) of the latency for 4 KB requests. Together they also reduce sustained throughput by 85%. However, simply removing those layers is not possible because they provide essential management and protection mechanisms.