Processing circuitry within a data processing apparatus will typically have access to memory in which data required to run any particular application on the processing circuitry will typically be stored. It will be appreciated that the data may consist of the instructions contained within the application and/or the actual data values used during execution of those instructions on the processing circuitry.
When the processing circuitry needs to access data in the memory, the processing circuitry issues an access request specifying an address for that data. Often, the access request will specify a virtual address, and address translation circuitry will be used to perform an address translation process in order to produce a physical address in the memory corresponding to the specified virtual address.
In certain data processing systems, it is known for the address translation process to be performed as a multi-stage process. In one example, a two stage address translation process can be performed, a first stage of the address translation process converting the virtual address to an intermediate address, and a second stage of the process then causing the intermediate address to be translated to a corresponding physical address. One such system is described in commonly owned U.S. Pat. No. 7,171,539, the entire contents of which are hereby incorporated by reference. The data processing apparatus described therein aims to provide hardware enforced security, the data processing apparatus being operable in either a secure domain or a non-secure domain, and different address translations from virtual to physical address being required dependent on the domain in which the data processing apparatus is operating. As described in the patent, in such a system a two stage address translation can be performed, with the second stage of the address translation being managed from the secure domain, and in particular allowing secure memory regions to be completely hidden from the non-secure operating system's view of its physical address space.
However, the use of a multi-stage address translation process is not only useful in data processing systems employing such hardware enforced security techniques, but instead can be used in a variety of other situations. One such example is a data processing system employing virtualisation techniques.
In a typical virtualisation environment, a processing device such as a processor core is arranged to execute hypervisor software which supports the execution of multiple virtual machines on that processing device. Each virtual machine will have one or more applications running on a particular operating system, with the hypervisor software acting as an interface layer between the virtual machine and the underlying hardware to enable the provision of appropriate hardware support to the virtual machine. Via the hypervisor software layer, each virtual machine gets a particular view of the system in which it resides, and thus gets a particular view of the available hardware resources of the system. Each virtual machine operates independently of other virtual machines on the system, and indeed is not necessarily aware of the presence of the other virtual machines.
Accordingly, in an example system, one virtual machine may be executed which runs a particular operating system, for example Microsoft Windows, whilst another virtual machine may be executed running a different operating system, for example Linux.
In such a virtualised system, multiple stages of address translation can be used. In particular, in one example, a first stage of address translation may be controlled by the particular operating system running inside a virtual machine in order to map a specified virtual address to an intermediate address, and then a second stage of address translation can be controlled by the hypervisor software in order to map the intermediate address to a physical address in memory. By ensuring that the hypervisor software manages the second stage of the address translation, then the hypervisor software can ensure the separation of the various virtual machines executing on the processing circuitry.
When performing a multi-stage address translation process, the circuitry performing that address translation will typically need to access a number of page tables provided within memory in order to determine the required translation, and to resolve access permission rights and determine region attributes. In particular, a separate page table will typically be accessed for each stage of the address translation, and hence in the above particular example a first page table managed by the virtual machine's operating system will be accessed during the first stage of the address translation to determine the required virtual to intermediate address translation, and then a second stage page table managed by the hypervisor software will be accessed during the second stage of the address translation in order to determine the required intermediate to physical address translation.
Each page table typically contains a plurality of descriptors, each descriptor providing, for a particular region of memory, address mapping information, access permissions rights, region attributes, the size of the memory region to which the descriptor relates, and any other required information. Indeed, often page tables are nested to form a multi-level structure, so that certain entries in the page table will actually point to a lower level page table providing descriptors for various regions of memory, rather than the descriptors being provided directly in the top level page table.
Since significant time can be expended in accessing the page tables, it is known to provide one or more translation lookaside buffers (TLBs) locally in association with the circuitry that performs the address translation (for example a memory management unit (MMU)), in which information retrieved from the page tables can be stored locally to improve performance when processing subsequent access requests. However, when adopting a multi-stage address translation process, it has previously been necessary to provide separate TLB structures for each stage of the address translation. Hence, by way of example, the virtual address specified by the access request can be used to perform a lookup in a first TLB structure, and if a hit is detected this will enable generation of an intermediate address from the relevant information stored in that TLB structure. Then, using the intermediate address, a lookup can be performed in a second TLB structure having entries specifying intermediate to physical address translations, and again if a hit is detected the physical address can be determined from the information stored in that second TLB structure.
Nevertheless, it will be appreciated that even if hits are detected in the various TLB structures, the need to perform lookups in multiple TLB structures can significantly impact performance when handling access requests. For example, considering the earlier virtualisation system, it is not efficient when the virtual machine is running to pass the address of every memory access request through at least two TLB structures in order to resolve the physical address, and instead it would be useful to provide a single TLB containing “consolidated” entries which enable a direct translation from virtual address to physical address using information derived from both sets of page tables. If such a consolidated TLB structure is used, this would mean that the overhead of having the two sets of pages tables would only be exposed on a TLB miss, thereby increasing performance in the common cases where a hit is detected in the TLB.
A known virtualisation technique uses “shadow page tables”, in which consolidated entries are made in the TLB. Considering the earlier mentioned two stage address translation, then when employing a shadow page table technique, a third set of tables is provided, containing consolidated virtual address to physical address translations, and when the virtual machine is running the MMU is pointed at these tables. Initially these tables are blank (i.e. every address causes a fault). When a fault occurs, the hypervisor reads the virtual address to intermediate address tables and the intermediate address to physical address tables, computes the virtual address to physical address translation, and adds an entry to the shadow page table. The hypervisor must also intercept all TLB maintenance operations issued from the virtual machine to keep the shadow page tables accurate. One disadvantage arising from the use of such shadow page tables is the increase in overhead resulting from maintaining the shadow page tables.
Further, certain problems can arise when using a consolidated TLB. Firstly, as mentioned earlier, each descriptor in a page table typically includes a field identifying the size of the memory region to which that descriptor relates. Considering the earlier-mentioned two stage address translation process, both stages of translation may involve referencing descriptors associated with a variety of different sizes of memory region, for example 4K pages and 2 Mb sections. Considering the earlier virtualisation example, if the virtual machine's operating system has chosen to use a 2 Mb section in an area of intermediate address space which the hypervisor software has mapped into 4K pages, then any consolidated TLB entry must be 4K in size, i.e. to match the size specified by the second stage page table. This can cause significant problems if the address translation for certain regions of memory later needs to be invalidated. For example, if the virtual machine's operating system later attempts to invalidate the section entry in the TLB (for example because it has changed, or is removed), it is very difficult for a consolidated TLB to handle this correctly. In particular, the TLB invalidate operation does not necessarily specify a size, so in order to guarantee correct operation, the TLB would have to search for all 512 possible 4K entries within the 2 Mb section required to be invalidated, which would be very inefficient. It is also very difficult to do this search conditionally since there is no guarantee that any particular entry among the 512 possible will actually be present within the TLB to act as a marker. Furthermore, since there could be many different valid page sizes in a particular working system, there is potentially a very large amount of searching needed. This problem will be referred to herein as the “larger page on top of small page” problem, since it occurs whenever a memory region size associated with the relevant descriptor in the page table for an earlier stage of the multi-stage address translation process is larger than a memory region size associated with the relevant descriptor in the page table for a later stage of the multi-stage address translation process.
As another example of a problem that can occur when using a consolidated TLB, both stages of address translation allow for access control information to be specified, for example access permissions rights. In particular, within the descriptors of each page table, access permission rights may specify whether an access to the corresponding region of memory is only allowed for a read operation, is only allowed for a write operation, is allowed for both read and write operations, or is not allowed at all. If the relevant descriptor accessed for the first stage of the address translation, namely the virtual to intermediate address translation indicates that the access is allowed, but the descriptor used for the second stage of the address translation, namely the intermediate to the physical address translation, indicates the access is not allowed, the fault must be reported to the entity in charge of the second stage of the address translation, for example the hypervisor software in the earlier mentioned virtualisation example. When reporting the fault, it will also be necessary to provide the intermediate address, since the hypervisor will not know, or indeed even care about, virtual addresses. In a consolidated TLB storing only physical addresses, it would be difficult to produce the intermediate address in such circumstances. Further, if an intermediate address were added to every entry merely to provide for such situations, this would be very inefficient since that intermediate address will not be needed most of the time. This problem will be referred to herein as the “later stage permission” problem, since it occurs whenever the access permission information associated with the relevant descriptor in the page table for an earlier stage of the multi-stage address translation process is more permissive than the access permission information associated with the relevant descriptor in the page table for a later stage of the multi-stage address translation process.
It is expected that the majority of entries within a consolidated TLB would be “well behaved” entries, which in the context of the above discussion of problem cases can be interpreted as an entry where the memory region size associated with a later stage of the address translation is the same size or larger than the memory region size associated with an earlier stage of the address translation (thereby ensuring that any TLB invalidate operation, as for example may be performed by the virtual machine's operating system in the above virtualisation example, will work as expected without further searching being required), and also the access permission rights of a later address translation stage are at least as “permissive” as the access permission rights of an earlier address translation stage (thereby ensuring that a later stage permission fault cannot occur).
Nevertheless, whilst the above discussed problem cases are expected to be comparatively rare, they are still likely to occur occasionally during the operation of the processing circuitry, particularly where legacy software is used. For example, in a virtualisation environment, it will typically be required to support unmodified legacy operating systems, and it is hence not an option to change the software to avoid such problem cases occurring.
Accordingly, it would be desirable to provide an efficient address translation mechanism in systems employing a multi-stage address translation process, whilst also ensuring correct handling of the problem cases discussed earlier.