A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. Virtual addresses are used by the accessing process, while physical addresses are used by the hardware or more specifically to the RAM.
In operating systems that use virtual memory, every process is given the impression that it is working with large, contiguous sections of memory. In reality, each process' memory may be dispersed across different areas of physical memory, or may have been moved (paged out) to another storage, typically to a hard disk.
When a process requests access to a data in its memory, it is the responsibility of the operating system to map the virtual address provided by the process to the physical address of the actual memory where that data is stored. The page table is where the operating system stores its mappings of virtual addresses to physical addresses, with each mapping also known as a page table entry (PTE).
Actions may be taken upon a virtual to physical address translation. Each translation is restarted if a TLB miss occurs, so that the lookup can occur correctly through hardware.
The CPU's memory management unit (MMU) stores a cache of recently used mappings from the operating system's page table. This is called the translation lookaside buffer (TLB), which is an associative cache.
When a virtual address needs to be translated into a physical address, the TLB is searched first. If a match is found (a TLB hit), the physical address is returned and memory access can continue. However, if there is no match (called a TLB miss), the handler will typically look up the address mapping in the page table to see whether a mapping exists (a page walk). If one exists, it is written back to the TLB (this must be done, as the hardware accesses memory through the TLB in a virtual memory system), and the faulting instruction is restarted (this may happen in parallel as well). This subsequent translation will find a TLB hit, and the memory access will continue.
The page table lookup may fail for two reasons. The first is if there is no translation available for the virtual address, meaning that virtual address is invalid. This will typically occur because of a programming error, and the operating system must take some action to deal with the problem. On modern operating systems, it will send a segmentation fault to the offending program.
The page table lookup may also fail if the page is not resident in physical memory. This will occur if the requested page has been moved out of physical memory to make room for another page. In this case the page is paged out to a secondary store located on a medium such as a hard disk drive (this secondary store, or “backing store”, is often called a “swap partition” if it is a disk partition, or a swap file, “swapfile” or “page file” if it is a file). When this happens the page needs to be taken from disk and put back into physical memory. A similar mechanism is used for memory-mapped files, which are mapped to virtual memory and loaded to physical memory on demand.
When physical memory is not full this is a simple operation; the page is written back into physical memory, the page table and TLB are updated, and the instruction is restarted. However, when physical memory is full, one or more pages in physical memory will need to be paged out to make room for the requested page. The page table needs to be updated to mark that the pages that were previously in physical memory are no longer there, and to mark that the page that was on disk is now in physical memory. The TLB also needs to be updated, including removal of the paged-out page from it, and the instruction restarted. Which page to page out is the subject of page replacement algorithms. However, these algorithms fail to provide the necessary abilities that may be afforded through the use of hardware management.
ARM processors use the ACE protocol to interact with the memory subsystems. The AMD x86 memory subsystem (Northbridge) uses the CCI protocol. The ARM ACE protocol and the AMD CCI protocol are not compatible. In fact, there are a large number of differences between the protocols that need to be bridged together. Some examples include: Request, Response, and Probe/Snoop encodings; ACE protocol has writes push data into the memory subsystem and CCI protocol has the memory subsystem request write data when it is ready;—The probe/victim deadlock; ACE protocol allows processors to pass modified data copy back responsibility to the memory subsystem on a probe response and CCI does not allow this; ACE and CCI power management control signals and handshakes are substantially different; and both protocols allow the processor and the memory subsystem to operate at different clock frequencies while CCI uses a clock enable scheme to handle clock ratio throttling in the faster clock domain and ACE uses a Ready/Valid handshake. A system that can effectively bridge these two protocols is needed.
The ARM based system handles probe/snoop race conditions with CPU writebacks differently than cHT based system. In a cHT based system, the CPU required to always provide a probe response without any dependencies. If the probe hits a CPU victim, the CPU indicates if the victim has already been sent (and will later be canceled) and supplies the writeback data in a probe response. If the victim has not yet been set, its state is downgraded according to the probe type. In the ARM system, the CPU will block and not deliver the probe response until the writeback has been completed. This may occur even for probes which queued within the CPU/cluster and have not been issued yet. Therefore a problem exits where a unified northbridge may become deadlocked based on the ARM victim/probe collision handling.