Many processing systems use virtual memory for handling data accesses by executing programs (e.g., applications, operating systems, device drivers, etc.). In such a processing system, programs access memory using “virtual addresses” in “virtual address spaces,” which are local address spaces that are specific to corresponding programs, instead of accessing memory using addresses based on the physical locations (or “physical addresses”) of blocks of memory (or “pages”). Thus, to support memory accesses, the processing system typically employs address translation circuitry to translate the virtual addresses to corresponding physical addresses. The address translation circuitry employs one or more translation lookaside buffers (TLBs) to cache virtual-to-physical address translations for efficient lookup by processor cores. To maintain coherency, whenever virtual addresses are remapped to a new physical address, or permission bits are changed, etc., an operating system must perform a TLB shootdown to purge outdated or invalid translations. TLB shootdown latency can significantly affect application performance in large multicore systems.