As should be appreciated, computer systems have employed mechanisms for controlling access to system memory by hardware devices that perform direct memory access (DMA) on their own behalf rather than using a program on a central processor to copy data to or from the hardware device from or to system memory. This translation mechanism exists to support, among other things, direct control of the hardware device by an application, a virtual machine, etc.
Among other things, a particular entity, device or construct (hereinafter, “entity”) on a computing device may require access to a resource associated with the computing device. As may be appreciated, such resource may be any sort of resource that can be associated with a computing device. For example, the resource may be a storage device to store and retrieve data, and generally for any purpose that a storage device would be employed. Likewise, the resource may be any other asset such as a network, a printer, a scanner, a network drive, a virtual drive, a server, and the like. Accordingly, whatever the resource may be, the entity may in fact be provided with access to services provided by such resource.
A typical physical operating system running on a typical physical hardware system allows applications running thereon to employ virtual memory addresses and addressing, and the physical operating system performs address translations between such virtual memory addresses and corresponding physical memory addresses of physical memory. Reasons for use of such virtual memory addressing are known and therefore need not be set forth herein in any detail. Generally, virtual memory addressing frees an application running on an operating system from being concerned with interfering with physical memory employed by another application running on the operating system, and also frees the application from being too closely tied to any particular physical memory structure. Instead, with virtual memory addressing, an address translator of the operating system is employed to perform address translations between corresponding physical and virtual memory address by way of an appropriate database or the like. In doing so, such an address translator also ensures that memory employed by one application is not interfered with or otherwise altered by any other application, and further ensures that each application has appropriate amounts of physical memory allocated thereto.
When a hardware device that performs DMA or the like is employed by an entity to access a corresponding resource, the addresses employed by the entity are virtual addresses. It should be recognized that these virtual addresses do not directly correspond to physical addresses, so, typically, a translation mechanism is introduced between the DMA device and system memory. This translation mechanism uses a database or the like to convert the virtual addresses provided by the DMA device into physical addresses. The translation mechanism is typically referred to as DMA remapping (DMAr) or as an IO memory management unit (IOMMU).
For most systems, an IOMMU will serve multiple DMA devices. That is, DMA requests from multiple devices will be sent to the IOMMU for processing. The IOMMU will use a database that is appropriate for the specific device to translate the virtual address in the DMA request from the device into the appropriate physical address.
An IOMMU will typically have a so-called translation look-aside buffer (IOTLB) in which it may keep a cache of recently used translations. This IOTLB allows the IOMMU to avoid time-consuming database accesses when the same address is used for many DMA operations. Normally, the translations apply to a large portion of memory such as a page so the same page address will be used for multiple DMA operations.
For some entities, the behavior of the centralized IOTLB is not adequate in minimizing the latency impact on DMA accesses. For these entities, it has been proposed that they be allowed to have their own IOTLB so that they can translate the addresses locally before the request is sent to memory. The advantage to local caching of translations is that the device has better knowledge of the way in which memory is going to be accessed and, therefore, it may more efficiently manage its IOTLB whereas the central IOTLB in the IOMMU is purely reactive and only responds to the immediate situation rather than being able to predict the future needs of devices.
One example of the use of the device's IOTLB (hereafter referred to as an address translation cache or “ATC”) is an isochronous device. An isochronous device has specific latency requirements so that it can maintain a steady flow of information. In order to prevent an interruption to the flow caused by an untimely, long latency access to the IOMMU database, the isochronous device can request the translations for the addresses that will be used for a transfer before the transfer starts. Then, the ATC will be able to provide the virtual to physical address translation for the isochronous transfer when data is actually available.
Since the ATC contains a cache of information from the IOMMU's database, the ATC must be informed when changes are made to the database that may make a value cached in the ATC no longer valid. An invalidation protocol is used to insure this synchronization between the central IOMMU database and the ATC. When a change is made to the database to a value that may be in an ATC, an invalidate command must be sent to that ATC indicating which GPAs translations have changed. The ATC would then purge its cache of any corresponding translations and send an indication back to the IOMMU to indicate that the entries have been purged. This protocol is time consuming and it is preferred that it not be used frivolously.
In using a remote ATC, many different programming models are possible. In one model, an entity may have a small buffer that is continually used for storage of commands to the device. This so called ring buffer is established when the entity is initialized and the addresses of the buffer do not change until the device is reset. The commands in the buffer may reference memory data buffers with the addresses in the commands expressed as virtual addresses. While the command buffer may be reused continually, the data buffers may be infrequently reused. So, while the device may benefit from having an ATC to use for the addresses of the command buffer, local caching of data buffer translations would not be worthwhile. This application illustrates a major problem with ATCs. The command buffer addresses never change and they are cached in the ATC. The data buffer addresses change quite a lot and they are not cached in the device's ATC.
However, the manager of the IOMMU database does not know that the data buffer addresses are not cached in the device, so whenever it changes an entry in the database for the data buffer, it will send an invalidate to the device's ATC. This results in many useless invalidates being sent to the ATC. If the software controlling the IOMMU database had knowledge of which translations could be in the device's ATC and which could not, it could avoid sending invalidations when none was required.