Processors in a modern computing system use absolute (virtual) addresses, instead of physical addresses, for addressing instructions and data for the purpose to expand the addressing capacity beyond what would be limited by the physical capacity. However, the processors of the computing system need to translate the virtual addresses back to physical addresses at some point during the processing of instructions and/or accessing data. To avoid using extra memory resource of additional memory structures to obtain the physical addresses corresponding to absolute addresses, processors typically keep a number of recent translations in a memory structure called a Translation Look-aside Buffer (TLB). TLB is a part of a processor cache. Every single processor in a multi-processor system may have its own cache and/or TLB.
The term “paging” refers to the mapping of an absolute address to a physical address. When the mapping of an absolute address to a physical address changes in a multi-processor system, every single TLB of each processor needs to be updated with the new mapping. The process of removing the old mapping of a TLB and updating it with the new mapping is called “page invalidation” or “TLB shoot down.”
Page invalidation of a multi-processor computing system produces a significant computational overhead that slows down the overall system processing speed. Time latencies normally exist among TLB shoot downs of different processors. In many situations, the multi-processor computing system cannot proceed until all TLBs are updated. Therefore, significant amount of processor resources are wasted on waiting for all the page invalidations to be completed. In modern multi-processor computing system, page invalidation presents a bottle neck to the overall computation efficiency.
The page invalidation process posts even more serious slowdowns when the TLBs need to be updated are relatively large. For example, a 32 instruction processors (IPs) computing system with 32 GB of physical cache memory may spend >90% of its processor cycles handling the page invalidation wherein 5 GB physical memory needs to be invalidated.
Embodiments of apparatuses, systems, and methods disclosed herein implement page invalidation methods that increase the efficiency of multiprocessor computing systems.