A memory management unit, MMU, is a module in a virtual memory system that provides address translation services, as well as memory fragmentation and memory protection capabilities to a group of devices sharing the MMU in the system. Information that enables the MMU to map virtual addresses to physical addresses is stored in a page table. Typically, the page table is stored in physical memory that is part of the main memory of the system in which the MMU is operating. One part of the MMU is a cache of recently used mappings from the page table. This cache is called a translation look aside buffer, TLB.
One procedure that the MMU performs is a page table walk, PTW, which is a mechanism of controlling (sometimes referred to as “walking”) the memory device to read page table descriptors containing address translation information for a given MMU Device.
Another procedure that is performed by the MMU is TLB pre-fetching. TLB pre-fetching is the mechanism of address prediction aimed to refill the MMU TLB with the appropriate translated addresses before they are requested by an initiator. Here, an initiator is a device that is coupled to the MMU data request interface and that is capable of issuing transactions (i.e. a data request) through the MMU.
In the existing MMU devices, TLB buffering capabilities and TLB pre-fetching mechanisms may be a bottleneck for the overall performance of the system due to TLB misses and the latency involved when fetching page table entries from the main memory system.
In other words, there is a desire for enhancing TLB pre-fetch capabilities to mask page table walk latency caused by TLB misses, as will be illustrated in the following. With reference to Figure Aa, a MMU 102 contains a TLB 104 with a depth n, and a pre-fetch buffer 106 with depth m, the depth being the number of page table entries each buffer can hold. The MMU 102 has data request interface 112 where data requests are received. As will be understood by those of skill in the art, the mmu 102 comprises control circuitry (CTRL) 114 that is configured to control the mmu 102 as described herein and illustrated in FIG. 1a as a functional block for the sake of clarity. The page table entries are virtual-to-physical address translation information located inside the mmu page table descriptor 108 in the main memory system 110.
When a transaction, i.e. a data request, occurs on the MMU data request interface 112, the TLB 104 is looked up to identify the TLB entry that corresponds to the incoming data request. A data request has attributes in terms of parameters that are associated with the address and data of the request in a given bus protocol (e.g., for the advanced extensible interface, AXI, bus protocol, the transactions attributes include address, burst size, burst length, type and cacheability). Moreover, realization of the look-up is by the control circuitry 114. If the corresponding entry is found, the address is translated and the data transaction is issued to the main physical memory.
If the entry is not found, i.e. a TLB miss occurs, a PTW is performed to fetch the corresponding entry from the page table descriptor. In parallel, a pre-fetch request is also issued to fetch the next page table entry in the page table descriptor, i.e. the page table entry that is subsequent to the entry corresponding to the address causing the miss in the TLB.
FIG. 1B illustrates the content of a TLB and MMU page table descriptor (e.g. corresponding to TLB 104 and page table descriptor 108 in FIG. 1A and how a PTW request re-fills the TLB. The “next” page table entry is the virtual to physical translation of the current virtual address that caused the miss plus an address stride. The address stride corresponds to the description of a memory page that has a given size. The size of a page in the page table descriptor is software driven and varies from one implementation to another.
In the implementations that exist today, 32-bit address space page table descriptors are described on three levels. Level 1 may contain the fragmentation of the memory space in 1 GB regions, while level 2 may contain the fragmentation of a 1 GB region into 2 MB regions. The descriptor may contain 512 entries, and level 3 may contain the memory mapping of each 2 MB region into 4 KB pages.
To perform an address translation, an MMU performs three page table walk accesses to level 1, level 2, and then level 3 descriptors (assuming that the memory is mapped in 4 KB pages).
In existing MMU devices, TLB buffering capabilities and TLB pre-fetching mechanisms may be a bottleneck for the system overall performance due to TLB miss and latency involved in fetching page table entries from the main memory system. This latter issue may be shared between a big number of hardware devices. Furthermore, MMU pre-fetch logic usually contains buffering capabilities for the last translation level (i.e. level 3).
In addition to that, a single MMU may be shared between several initiators, an initiator being a device coupled directly to the MMU data request interface and capable of issuing transactions through the MMU. In such cases, the TLB, pre-fetch buffers and other MMU device resources are used by several sources and the miss rate in the TLB is higher. This is due to the fact that these initiators are independent and access several distinct buffers in the main memory simultaneously. In addition, the access pattern within the same buffer may not be contiguous in the virtual address space, which makes reuse of TLB entries very low and pre-fetching the next page table descriptor less than optimal.
Moreover, TLB replacement policy can also be a limitation depending on the traffic access pattern. A replacement policy is the implementation of an algorithm responsible for replacing the TLB entries when the TLB is full. The algorithm is responsible for choosing the TLB entry that has to be evicted to replace it with a newly requested one.
In some cases, the TLB replacement policy is implemented via round-robin. Round-robin is a scheduling algorithm where the eviction order of the TLB entries is cyclic and specific priority may not be applied to the TLB entries. The eviction of the TLB is then based on the location of the entries and could lead to the eviction of entries that are currently in use, especially if TLB depth is limited (largely less than the number of buffers accessed simultaneously).
In view of the current state of memory management systems, there is a desire for enhancing TLB pre-fetch capabilities to mask page table walk latency caused by a TLB miss.