The present invention relates to computers and, more particularly, to multiprocessor systems employing TLB shootdown as part of a memory-management scheme. A major objective of the invention is to provide an approach to TLB shootdown that scales well with large numbers of processors in a multi-processor system.
Many modern computer systems use virtual-memory schemes to match the memory requirements of the computer programs run on these systems to available memory resources. An operating system typically assigns virtual memory address “pages” to each program, and assigns these virtual-memory pages to physical memory pages, preferably in solid-state random access memory (RAM), with excess virtual memory pages being assigned to hard-disk locations on some priority basis when RAM capacity is exceeded. The virtual-memory assignments are stored in a page table, typically in RAM. So that a processor does not have to perform a time-consuming access of main memory every time a virtual memory assignment needs to be read, copies of recently used page-table assignments can be cached in a translation look-aside buffer (TLB).
Typically, when a program terminates, some of the virtual memory assigned to it can be made available to other programs. The operating system can instruct the processor running the program to de-assign the no-longer-needed virtual memory pages in the page table. Then any corresponding TLB entries for that processor and for any other processor in a multiprocessor system must be purged so that all TLBs are coherent with the page table. To this end, a processor can write its TLB shootdown to a dedicated location in main memory and send an interrupt to the other processors, which then read the TLB-shootdown data, purge their TLBs accordingly, and report when their purges are complete. The de-assigned virtual memory can then be released for reassignment.
Various lockout mechanisms can be employed to prevent a processor from writing TLB-shootdown data to the TLB-shootdown memory location when it is in use by another processor. The processor that is locked out waits until the first TLB purge is complete before it can begin its own TLB purge. The “waiting” actually can involve a lot of rechecking, which can consume system bandwidth. As the number of processors increases, the frequency of contentions, the waiting periods, and the bandwidth consumption all increase, limiting scalability. What is needed is an approach to TLB-shootdown that scales better with the number of processors in a multiprocessor system.