This invention relates to data transfer in a data processing system supporting virtualised input/output devices. In particular, this invention relates to an improved technique by which translation and protection of virtual memory addresses is performed in a virtualised data processing system.
High performance I/O (input/output) devices of a data processing system are typically able to access host memory directly via Direct Memory Access (DMA). That is, they are able to read the contents of memory and write data to memory directly, without mediation by host software. The addresses from which a device reads and to which it writes are typically supplied by a device driver.
With reference to FIG. 1, in some systems the addresses used by a device 103 to access host memory 101 do not correspond directly with physical memory addresses. In these systems, I/O addresses on the I/O bus 104 must be translated into machine physical addresses on the system bus 105. This translation is generally performed at a device called an I/O Memory Management Unit (IOMMU) 102. An IOMMU may perform either or both of translation services (converting I/O addresses to machine physical addresses) and protection services (limiting the range of machine physical addresses that are accessible by individual devices, or all devices on an I/O bus).
To maintain the integrity of a data processing system it is essential that an I/O device only write into regions of host memory that are intended for I/O. In conventional monolithic systems, the device driver for an I/O device is a trusted component of the system that is generally supported at the kernel, and so can be relied upon to supply valid addresses for I/O buffers in host memory.
In virtualised systems, a single host supports multiple guest system images, each running an untrusted operating system instance. The host and guests are managed by a trusted hypervisor (or more generally a privileged software domain or domains). Typically only the hypervisor has direct access to I/O devices, and forwards I/O requests from guests via a software indirection. Guests cannot be given direct access to the physical interface of conventional I/O devices which are designed to be exclusively accessed by a single trusted device driver, and because a device driver of a guest operating system is not trusted to supply valid addresses for DMA.
The mediation of the hypervisor in the I/O path adds significant overhead and reduces performance through additional processing, context switches and copy operations. In order to address this inefficiency it is desirable to grant guests direct access to I/O devices. This can be achieved in paravirtualised systems whilst maintaining system integrity by arranging that:    (i) I/O devices provide multiple virtual interfaces; at least one for each guest that will access the device directly.    (ii) addresses supplied to the I/O device by device drivers in untrusted guests are translated in a secure manner to machine physical addresses that correspond to memory allocated to the guest.
The second requirement is necessary to ensure that a guest cannot compromise system integrity by causing the I/O device to write to areas of host memory belonging to other guests or the hypervisor, and to prevent the guest reading from areas of memory it is not authorised to access. Translation and protection of guest I/O addresses to machine physical addresses is conventionally performed at a system IOMMU.
The PCI-IOV (Peripheral Component Interconnect—I/O Virtualisation) architecture was designed to address the problem of supporting DMA by I/O devices using addresses supplied by untrusted guests. PCI-IOV defines a standard by which a device can export multiple interfaces (one to each guest) and perform secure address translation and protection for DMA transfers on a per-guest basis. In the PCI-IOV architecture the translation and protection are performed in an IOMMU.
Under the PCI-IOV standard, each of the exported interfaces is referred to as a Virtual Function (VF). A guest is given direct access to a VF which it uses to initiate I/O requests. Each device also has at least one Physical Function (PF) that is typically used by a privileged device driver in the hypervisor to manage the device, and may also be used in some cases for control and data-path operations. A PF is the physical interface defined by the PCI Express base specification to which PCI-IOV is an extension.
Each VF and PF is assigned a unique Requester ID (RID). By extension, each RID is therefore associated with a particular protection domain (guest or hypervisor) of a virtualised system. When an I/O device initiates an I/O operation request using an I/O address supplied by a guest (by means of a VF), it uses that guest's RID in the request so that the IOMMU (referred to as a Translation Agent (TA) under the PCI-IOV standard) can translate the guest I/O address to a machine physical address accessible to the I/O bus. Typically, the I/O addresses used by a PF are not translated—or have a simple fixed translation—and require no protection.
Under the PCI-IOV standard, a guest can invoke an I/O device via a VF to request an I/O operation, specifying one or more guest I/O addresses. In response the device issues requests on the I/O bus with the RID that corresponds to the guest. The IOMMU uses the RID to identify a mapping from guest I/O addresses to machine physical addresses of memory accessible to the guest. The IOMMU issues an error if the I/O address is not valid for the particular protection domain (guest).
The translation of addresses at the IOMMU can be a bottleneck for a virtualised data processing system. To address this, some I/O devices are able to request that addresses be pre-translated by the IOMMU, and the translated addresses are cached on the I/O device. The device can then issue requests with the pre-translated addresses, and when it does so it tells the IOMMU that no further translation is needed. The protocol for managing the caching of translations over PCI is formalised in the Address Translation Services (ATS) specification. However, this scheme is complex and expensive to implement in terms of hardware and software design.
There is therefore a need for an improved technique by which translation and protection is performed in a system supporting virtualised I/O devices.