Modern computer systems often contain storage devices that utilize memory such as flash memory to store data. Other memory technologies, such as Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), phase-change memory (PCM) and resistive memory (ReRAM) can also be used in place of flash memory. All of these have the characteristic that they are slower than DRAM, which is used as the primary working memory of the computer system. These other memories supplement the DRAM in some fashion and will be called “second tier memories” herein. In some cases, the data from the second tier memory must be brought into DRAM before it is operated upon.
When data from a second tier memory device is to be written to host dynamic random access memory (DRAM), it must have write-access to the host DRAM pages. In a system with an input/output memory management unit (IOMMU) enforcing the principle of least privilege, the IOMMU management can add critical latency to the process of fetching data from second tier memory and copying to DRAM. For example, where pages are only writeable by the second tier memory device for a minimum duration needed by the second tier memory device, there are no long-lasting mappings to all of physical memory. Accordingly, the mappings must be performed for each write operation, which creates delay.
One type of storage device controller is an application specific integrated circuit (ASIC) that uses low-latency non-volatile NAND storage (LLNAND) flash that serves as a swap device, among other functions. Infrequently accessed DRAM pages are evicted by the operating system (OS) and put into the storage device controller. On a user mode page fault to an evicted page, the OS page fault handler determines where the swap-in data comes from. When the storage device controller is the handler for a page fault, it must fetch a page from its flash device and copy it into host CPUs DRAM at a specified address.
The storage device controller may have a number of command buffers that can service simultaneous outstanding requests, and then do a completion writeback to a specified address. For example, some storage device controllers have 128 command buffers. Copying from the storage device controller into CPU DRAM requires either that the storage device controller have full access to CPU DRAM, or that IOMMU mappings be dynamically created as needed. IOMMU mapping creation may be a performance burden.
Typically, writing data from a second tier memory, such as a flash memory, to a DRAM requires an OS page fault handler to determine that a backing storage is at the storage device controller flash address (FA), allocate a free, zero-filled 4 KB page at DRAM physical address (PA), map PA into the IOMMU, and send a command to the storage device controller containing {FA, PA, completion data}. Then the storage device controller copies bytes from flash at FA into an internal buffer. This can take a relatively long time, such as 1 us-100 μs. Next, the storage device controller copies bytes from an internal buffer to system DRAM PA, and sends a completion notification. Then, the OS unmaps PA and resumes the user code. This serial process results in significant latency.