The present invention relates to multiprocessor computing systems and, more particularly, to memory management unit (MMU) trap synchronization in multiprocessor computing systems. Multiprocessor computing systems are coming into increasingly common usage due to the many advantages inherent in such systems. In such systems, a single operating system controls the operation of all the microprocessors (CPU's) of the system. Common multiprocessor systems include scalable shared memory (SSM) system symmetric multiprocessor (SMP) systems. System symmetric multiprocessing can make use of multiprocessor computing architectures configured so that all CPU's can access all random access memory locations. Many architecture can implement such systems. Example include without limitation X86 systems as well as SSM system symmetric multiprocessor system architectures designed by Motorola, IBM, and Microsoft (e.g., NT systems). Linux based systems can also take advantage of the principles of the invention. Another architecture suitable for implementing the principles of the invention is the so-called SPARC architecture. SPARC is short for Scalable Processor Architecture, a RISC (reduced instruction set computer) technology developed by Sun Microsystems. The term SPARC® itself is a trademark of SPARC International, an independent organization that licenses the term to Sun for its use. The details of the SPARC specification are well known to persons having ordinary skill in the art and can be found in many standard references. One example of such a reference is entitled “The SPARC Architecture Manual, Version 9”, by SPARC International Edited by David Weaver and Tom Germond, which is hereby incorporated by reference.
Scalable shared-memory multiprocessors distribute memory among the many processors of a system and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be excessively concentrated in the slow shared memory rather than taking advantage of the high speed memory contained in the processors.
As is known to persons having ordinary skill, memory management units (MMU's) maintain listings that include mappings from virtual addresses to associated physical addresses (which exist in RAM). The advantage of such mappings is that the more commonly used physical addresses can be stored in a microprocessor cache for high-speed access. On the downside is the fact that the cache memory used to store these mappings is very small. Such mappings include a physical address and virtual address (also referred to as a translation table entry or TTE) a listing of attributes (such as memory access protection attributes), and a context identifier (which SPARC refers to as a MMU context). Such information is commonly stored in a translation lookaside buffer (TLB) (also known as TB(2), translation buffer, ATC, or address translation cache). The TLB is a small piece of associative memory within a processor which caches part of the translation from virtual addresses to physical addresses. Thus, whenever physical address information is required, the TLB is consulted. One significant advantage of the cache memory is that it performs operation extremely quickly. Thus, it is desirable to take advantage of such cache memory as much as possible.
However, whenever a required translation for a particular virtual address is not present in the TLB the required information must be acquired from another memory asset, which is commonly much slower. This process is referred to as a “TLB miss trap” or alternatively as a type of “MMU trap”. In such cases (“miss traps”) the address translation must be resolved using other mechanisms. For example, these translations from virtual to physical addresses (as well as other associated information) are also stored in secondary memory resources (also referred to herein as secondary memory assets. Commonly, the secondary memory assets include translation storage buffers (TSB's) and page tables (which stores the entire virtual address space description of each process). These secondary memory assets are commonly located in random access memory (RAM) where they are less easily accessed. Accessing virtual addresses from such secondary resources is a much slower process than TSB access and commonly slows down the system operation. This is especially problematic when remappings of the virtual address space require frequent changes to the TLB.
An additional difficulty encountered when virtual addresses or virtual address spaces are being remapped is that other processes or threads seeking access to the affected virtual addresses cannot access the information without causing inconsistent or fatal results. Therefore, in a multiprocessor computer system, maintaining a consistent view of a virtual address space in a multi-threaded process is critical. To obtain this consistent view requires synchronization by the operating system kernel across all of the processors in the system whenever a process virtual address space changes. Thus, when virtual addresses are changed they must be changed for all processes and all processors across the entire system (this process is referred to as synchronization). Such an operation requires that the old translation table entry for a given virtual address be removed (called “demapping” or “unmapping”) and a new translation table entry (having a different physical address) be entered. This is referred to a “TLB shootdown”. Additionally, all TLB shootdown events must be synchronized across the entire virtual address space for all CPU's in the system. Such synchronization prevents inconsistent mappings so that no single virtual address maps to more than one different physical address. Thus, the TLB's of all CPU's for any given virtual address all map to the same translation table entry.
Further, a difficulty arises when virtual address is being unmapped and another process (possibly running on another processor) seeks to access the same shared memory resource (e.g., a translation table entry (TTE) shared by another process or thread). Such activity can lead to an inconsistent view of the virtual address space. In such a situation the multiple threads will get inconsistent views of the process virtual address space while the virtual address space is changing.
One conventional solution to this problem is using “cross-calls” to halt system operation during TLB shootdown events. Each time a memory unmapping event (e.g. a TLB shootdown) occurs, the unmapping process issues a “cross-call” to all CPU's in the system instructing the CPU's to halt operation while the TLB shootdown operation is performed. The CPU's resume operation when the TLB shootdown operation is complete. This prevents any CPU from accessing a virtual address undergoing a change.
Another conventional approach employed in some architectures is to have all CPU's that are running threads unmap all TLB entries at a given virtual address and place dispatcher locks on the thread while the TLB shootdown occurs.
Each of these conventional approaches has serious drawbacks. For example, in the first approach all CPU's in the system are required to cease operation during the TLB shootdown operation. Moreover, in SPARC compliant systems (e.g., Solaris® systems), the TSB's are more frequently resized to accommodate increases in content and address information. Such resize operations take whole portions of the virtual address space offline for periods of time during a resize operation and must be synchronized. The conventional synchronization approaches require all CPU's in the system to shut down during resize operations. Additionally, as more and more CPU's are added to the system and more and more processes run on the system these resize operations become much more frequent (as do TLB shootdown events). Since each resize operation requires remapping of many virtual address mappings, synchronization requires more and more CPU's to be taken offline for longer and longer periods of time. In fact, as the systems become larger (incorporating more CPU's, more processes, more threads, and consequently more demapping and resizing operations) such operations can bring a system to a standstill. Quite simply, all the CPU's are paused while the TLB, TSB, or page tables are being demapped or updated. Consequently, during such operations the CPU's are idle doing no meaningful work during the time in which the virtual address space is being reconfigured. This can impose drastic bottlenecks in system efficiency, especially as the systems grow larger. Solutions to this problem are needed.
Accordingly, there is a need for improved methods of accomplishing MMU trap synchronization that does not require all CPU's in a system to shut down each time there is a memory access instruction that demaps a virtual address or resizes a TSB. In view of the foregoing, there is a need for improved methods and systems for accomplishing MMU synchronization in a multi-processor computing system.