The invention relates to a device and to a method for multi-stage translation of prefetch requests. Such a device may be part of an I/O (Input/Output) controller. The I/O controller may be coupled to a processing unit, e.g., a CPU, and to a memory. The I/O controller may include an I/O link interface, an address translation unit and an I/O packet processing unit.
Following the trend for virtualization in processor cores, virtualization is finding increasing adoption in the I/O space as well. Together with the trend for network adapters to provide user-level-like queue based interfaces to the consumers, mainly used for providing each virtual machine running on the system with at least one private queue for interaction with the network device, I/O virtualization support in the I/O root complex, which is usually a PCI Express root complex, gains increasing importance. This requires the PCI Express Host Bridge (PHB) to provide address translation capabilities, such that different physical or virtual functions of a device can access their own virtual address space safely. This is becoming an increasing challenge with the increasing line speeds of PCI Express and the high parallelism used by I/O devices that creates little spatial locality in the requests from the device and thus increases the pressure on the root complex address translation unit.
At the same time, the translation caches of the root complex need to be small in order to be able to fit multiple root complexes on a processor to support a large number of links with different link configuration. The caches can also not be shared easily between PHBs as the attached devices usually do not share the same virtual domains and therefore require their own translations and caches. In addition, as mentioned above, virtualized devices in general show little spatial and temporal locality that would improve the efficiency of the translation unit cache.
U.S. Pat. No. 7,487,297 B2 describes a method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.
JP 2010-217992 shows a cache controller, a cache control method and a cache control program.
In the European Patent Application 11195663.7, an I/O controller is described which is coupled to a processing unit and to a memory. The I/O controller includes an I/O link interface, an address translation unit, an I/O packet processing unit, and a prefetcher. The I/O link interface is configured to receive data packets having virtual addresses. The address translation unit includes an address translator for translating received virtual addresses into real addresses by translation control entries and a cache allocated to the address translator for caching a number of the translation control entries. The I/O packet processing unit is configured to check the data packets received at the I/O link interface and to forward the checked data packets to the address translation unit. The prefetcher is configured to forward address translation prefetch information from a data packet received at the I/O link interface to the address translation unit. Further, the address translator is configured to fetch the translation control entry for the data packet by means of the address translation prefetch information from the allocated cache or, if the translation control entry is not available in the allocated cache, from the memory. Thereby, translations from virtual addresses to real addresses may be prefetched, improving performance by reducing address translation miss stalls in the address translation unit in spite of little spatial locality of the addresses in the requests from I/O devices.