1. Field of the Invention
The present invention is generally directed to computer systems, and more particularly, to input/output memory management units (IOMMUs).
2. Related Art
A memory management unit (MMU) can be associated with a central processing unit (CPU). For example, a CPU MMU is configured to translate virtual addresses used by the CPU into physical addresses corresponding to system memory, and the CPU MMU validates the access (present, read, write, etc.), allowing for memory over-commit, relocation, and protection in association with the CPU.
In systems relating to x86 CPUs, an Input/Output (IO) MMU associated with Input/Output peripherals has been defined relatively recently. An input/output memory management unit (IOMMU) can retrieve translation information from system memory responsive to peripheral requests associated with, e.g., virtual addresses used by the peripheral, to translate the virtual addresses to corresponding physical addresses of system memory.
The IOMMU typically can contain a page-table walker logic that inspects the contents of main system memory to find the necessary translation information (perform a page-table walk). For example, when a peripheral requests information that is not cached in the IOMMU (i.e., a “miss”), the page-table walker is used to obtain information from system memory. However, the page-table walker can be complex to implement, increasing the silicon area and power dissipation of the IOMMU chip or chip component. The IOMMU implements the page-table walker in an attempt to be locally optimal based on limited information available to the IOMMU hardware (e.g., affecting information cached in the IOMMU based on least-recently-used (LRU) algorithm). Such examples of hardware-only implementations can potentially lead to excessive translation fetches (“page-table walks”) and excessive translation misses, degrading performance of the IO subsystem and leading to increased memory latency.
Additionally, the IOMMU typically is configured to read and parse information based on the format of page table entries associated with a particular architecture, limiting the IOMMU to a particular page table architecture, committing page table formats into hardware designs and, by implication, to a particular compatible processor implementation.
Software architected/managed translation look-aside buffer (TLB) caches are also known. Software manages a TLB and any page table walks are done in software. The software loads entries into the TLB, but no hardware infrastructure is backing the software architected TLB. Furthermore, software architected TLBs have inflexibility when it comes to loading and/or invalidation-when an entry is loaded into the TLB, the loader has the effect of replacing a previous entry.
Some IO controllers or peripherals contain simple MMUs that are managed by device driver software in the operating system or hypervisor. For example, a typical graphics controller contains a “local MMU” on the graphics card. In such a case, the “local MMU” mapping hardware is controlled by system software using sophisticated algorithms, but each MMU is unique and requires a unique driver. Changes to the peripheral hardware require changes to the driver, driving up development costs and lengthening development schedules, ultimately delaying time-to-market. This also means that a vendor cannot write a general driver for a hypervisor in a virtualized system and so specific drivers must be included within the hypervisor, the selection of which depends on the precise IO peripherals present in the system. This means that yet another driver must be written and tested for the hypervisor in addition to the drivers for the supported operating systems, again driving up development costs and time.
An approach is needed that improves IOMMU performance and provides a standardized interface enabling software to be written once for a hypervisor and used for multiple implementations of a peripheral memory mapping architecture.