The architecture of most current personal computer (PC) systems, from desktop to server, may be conceptually and schematically illustrated by FIG. 1, to which reference is now made.
PC system 10 typically includes memory 20, which may be embedded within one or more processing units 12, or may be separate therefrom. Processing units 12 are typically coupled with IO devices 14[1]-14[i] via one or more IO buses 16, e.g., peripheral component interconnect (PCI) buses. Some or all of the IO devices may be coupled with an IO bridge 17, which may be coupled with IO bus 16. Optionally, in order to make the connection between processing units 12 and IO devices 14[1]-14[i] quicker, PC system 10 may also include one or more components, e.g., a north bridge unit 18, that communicate with the processing units 12 and control the interaction with memory 20, and the IO buses 16.
Processing unit 12 typically includes a Central Processing Unit (CPU) 26 that typically refers to virtual memory addresses or space, which get translated by the memory management unit (MMU) 24 into physical addresses. The physical address is typically used for cache 22 (although some processor architectures use virtual addresses for cache access) and access to memory 20. In addition to ‘virtual to physical’ translation information, the MMU 24 typically contains memory protection information used to grant memory access to its owner, e.g., to the thread or process that requested the memory access. For example, system pages may typically be read only by a privileged process, such as by an operating system, or by another privileged process, while user space processes are typically allowed to access only their own memory pages.
In the computer architecture described in FIG. 1, there is substantially no memory protection for Direct Memory Access (DMA) done from a DMA-enabled IO device 14[1]-14[i], whether the IO device is directly coupled with IO bus 16 or whether the IO device is coupled with IO bridge 17. In both cases, IO device 14[1] and 14[i] communicate via DMA engine 28 to directly access memory 20.
As shown in FIG. 1, IO bus 16 is coupled with memory 20 through north bridge unit 18 without the involvement of CPU 26 and MMU 24. Therefore, IO devices 14[1]-14[i] that typically use physical addresses have access to all memory space, both to privileged memory space, such as the memory space of the operating system, and to non-privileged memory space, such as the memory space of applications running on PC system 10. Any mis-configuration of IO devices or hostile configuration of IO devices 14 may compromise the stability and integrity of PC system 10 by allowing the DMA engines 28 of IO device 14[1]-14[i] to freely access any region in memory 20 of system 10.
Furthermore, in DMA based IO operations that use physical addresses, the operating system is typically required to access the operating system page tables to find the physical addresses of the pages involved in the operation. To ensure that the required pages are present in the memory, multiple page faults may be issued to retrieve them. Since a continuous range of virtual addresses spanning multiple pages may be mapped to a non-contiguous range of physical pages, the DMA operation is often broken into multiple page-sized operations. More recent DMA engines or controllers attempt to support DMA operations to a non-contiguous range of physical pages, but in both cases, the physical pages involved in the DMA operation cannot be relocated or evicted from the memory while the operation is in progress, to avoid overwrite or read operations of pages that belong to other processes. To ensure the safe operation of the system, all pages involved in the DMA operation are required to be pinned before the operation begins and later unpinned once it is completed. For non-blocking IO operations, the pages are required to remain pinned even when the process crashes, as long as the DMA operation is still in progress.
When systems use a hypervisor (not shown) to manage sharing of the processor by multiple operating system instances, safe DMA operation is an even harder target. In such systems, each operating system instance is typically allocated a subset of the physical memory that it manages on behalf of its processes. These instances cannot be trusted for pinning their pages when performing an IO operation. Therefore, when any of the instances crashes, it is substantailly impossible for the hypervisor to determine which page is involved in an ongoing DMA operation and which page can be safely allocated to a new operating system instance.
The following two attempts to solve this problem are equally ineffective and have problems related to their complexity and performance. The first attempt is to perform the IO operations through the hypervisor itself, which results in a complicated design of the hypervisor, which also requires modification to the hosted operating systems. The second attempt is to perform the IO operations through a dedicated memory partition, but in this case the hypervisor is still required to manage the allocation of pages in that partition.
In more recent systems the memory is better secured using a specialized virtual address space for IO, typically referred to as the “IO address space”, and having IO devices use that space for their DMA operations. An exemplary system is illustrated in FIG. 2, to which reference is now made. System 30 includes one or more 10 Memory Management Units (IOMMU) 32. Some or all of IO devices 14[1]-14[i] may include a local IOMMU 32. Alternatively, two or more IO devices 14[1]-14[i] may share a common IOMMU 32. Each IOMMU 32 typically uses translation and protection tables that hold the mapping between virtual addresses at the IO address space and their corresponding physical addresses. Before an IO operation can take place, the operating system updates the translation tables for the IOMMU, so the operation, e.g., DMA operation, targets the correct physical pages.
System 30 has a few problems. First, the operating system has to manage two sets of translation tables, one for the processes that are executed on the CPU, for mapping their virtual address space to physical memory (to be used by MMU 24), and the second for IO, for mapping the IO address space to physical memory (to be used by IOMMU 32). The operating system is always required to keep the tables in sync. For example, if a process virtual to physical mapping is modified, then the corresponding IO translation tables are required to be modified as well so the DMA operation will access the right set of physical pages. Keeping the tables synchronized raises difficulties such as race conditions issues, complicates memory management and IO handling code in the operating system, and may adversely affect the performance of IO operations. The problem is even more complicated if IOMMU 32 caches recent translation entries to speed-up translation. In this case, entries at the IOMMU TLB (translation lookaside buffer, not shown in FIG. 2) are required to be invalidated whenever a process mapping is changed or a new IO operation begins. Second, the IO address space, unlike the processes virtual address space, is not protected, which means that potentially any IO device can access any region at the IO address space and thus access any region of the physical memory which is mapped by the IO translation tables. Solutions to this problem are typically handled by complex layers of software. Third, there is no mechanism for IOMMU 32 to handle translation exceptions, e.g., when a valid translation for an I/O virtual address is not found at the I/O page tables. It is then the responsibility of the operating system to ensure that valid translations for ongoing IO operations exist.