Memory management Units (MMUs) are used in microcontrollers, network processors and other types of microprocessors, and are components through which memory access transactions are passed in order to provide, for example, translation from virtual memory addresses to physical memory addresses. In addition, MMUs are often implemented to provide memory protection to prevent access to certain (protected) regions of memory being accessed by unauthorised processes and/or components within the microprocessor or the computer processing system in which the microprocessor is implemented.
Paging is a memory management scheme by which data may be stored to and retrieved from secondary memory for use in main memory. In a paging memory management scheme, an operating system retrieves data from secondary memory in same-size blocks called pages. Paging allows the physical address space of a process to be non-contiguous. Before paging came into use, systems had to fit whole programs into storage contiguously, which caused various storage and fragmentation problems. Paging is an important part of virtual memory implementations in contemporary general-purpose operating systems, allowing them to use secondary memory for data that does not fit into physical random-access memory (RAM). Furthermore, paging enables the benefit of page-level protection whereby user-level processes can be limited to seeing and modifying data which is paged in to their own address space, providing hardware isolation. System pages can also be protected from user processes.
Modern MMUs typically use a page table to store the mapping between virtual addresses and physical addresses. The page table comprises one page table entry (PTE) per page, and is typically stored within memory. In order to improve virtual address translation speed, it is known for MMUs to use a translation lookaside buffer (TLB). A TLB typically comprises an associative cache of PTEs, and typically contains a subset of the PTEs within the TLB. The TLB may comprise recently accessed and/or regularly accessed PTEs, or contain PTEs according to any other PTE caching strategy. In this manner, the translation speed between virtual addresses corresponding to PTEs within the TLB may be significantly reduced since they are cached and readily available. If a translation from a virtual address corresponding to a PTE not within the TLB is required, then the full page table stored within memory is required to be referenced, which is a significantly slower process than simply referencing the cached TLB.
A typical state of the art MMU implementation, such as the ARM™ System Memory Management Unit (SMMU), has the following characteristics:                A low page granularity (e.g. 4 KB, 64 KB, 512 MB, 1 TB);        A bypass mechanism based on, for example, a stream ID;        General MMU features such as address translation, memory protection, etc.        
A problem with such state of the art MMU implementations occurs when, for example, a master device (e.g. a processing core) comprises a large private area of memory, for example 256 MB. In a typical implementation comprising page sizes of, say, 64 KB, 4096 PTEs are required for the large private area of memory for the master device. Typical MMU implementations only support 128 PTEs within their TLB. Because of this, when the master device attempts to access its private area, the likelihood of the address being present in the TLB is small (128/4096). Thus, a high page miss rate (97%) will occur when the master device attempts to access its private area, which will have a significant impact on the performance of the system.
Although the bypass mechanism may be used to allow the page checking feature to be bypassed (and thus bypass the need to reference the page table stored in memory), this would result in no protection of the master device's private area of memory.