1. Technical Field
The present invention relates in general to a method and system for data processing and, in particular, to a data processing system and method for maintaining translation lookaside buffer (TLB) coherency in a data processing system. Still more particularly, the present invention relates to a data processing system and method that maintain TLB coherency without enforcing instruction serialization.
2. Description of the Related Art
The data storage system of a computer system typically includes one or more nonvolatile mass storage devices, such as magnetic or optical disks, and a volatile random access memory (RAM), which can include both low latency cache memory and higher latency system memory. In order to provide enough addresses for memory-mapped input/output (I/O) and the data and instructions utilized by operating system and application software, computer systems also typically reference an effective address space that includes a much larger number of addresses than physically exist in memory mapped I/O and RAM. Therefore, to perform memory-mapped I/O or to access RAM, a processor within a computer system that utilizes effective addressing is required to translate an effective address into a physical address assigned to a particular I/O device or a physical location within RAM.
In the PowerPC.TM. RISC architecture, which is described, for example, in PowerPC.TM. 603 RISC Microprocessor User's Manual, which is available from International Business Machines (IBM) Corporation of Armonk, N.Y. as Order No. MPR603UMU-01 and incorporated herein by reference, the effective address space is partitioned into a number of uniformly-sized memory pages, where each page has an address descriptor called a Page Table Entry (PTE). The PTE corresponding to a particular memory page contains the base effective address of the memory page as well as the associated base physical address of the page frame, thereby enabling a processor to translate any effective address within the memory page into a physical address in memory. The PTEs, which are created in RAM by the operating system, reside in Page Table Entry Groups (PTEGs), which can each contain, for example, up to eight PTEs.
In order to expedite the translation of effective addresses to physical addresses during the processing of memory-mapped I/O and memory access instructions (hereinafter, referred to simply as memory referent instructions), conventional processors often employ one or more translation lookaside buffers (TLBs) to cache recently accessed PTEs within the processor's core. Of course, as data are moved into and out of physical locations in memory (e.g., in response to the invocation of a new process or a context switch), the entries in the TLB must be updated to reflect the presence of the new data, and the TLB entries associated with data removed from memory must be invalidated. In many conventional processors such as the PowerPC.TM. line of processors available from IBM Corporation, the invalidation of TLB entries is the responsibility of software and is accomplished through the use of an explicit TLB invalidate entry instruction (e.g., TLBIE in the PowerPC.TM. instruction set architecture).
In multiprocessor data processing systems in which multiple processors have access to system memory (e.g., a symmetric multiprocessor (SMP)), the invalidation of a PTE cached in an entry of a processor's TLB is complicated by the fact that each processor has its own respective TLB. In order to maintain a coherent view of system memory across all the processors, the invalidation of a PTE in one processor requires the invalidation of the TLB entries, if any, within other processors that cache the same PTE. In many conventional multiprocessor computer systems, the invalidation of PTEs in all processors in the system is accomplished by the execution of a TLB invalidate entry instruction within an initiating processor and the broadcast of a TLB invalidate entry request from the initiating processor to each other processor in the system. The TLB invalidate entry instruction (or instructions, if multiple TLB entries are to be invalidated) may be followed in the instruction sequence of the initiating processor by one or more synchronization instructions that guarantee that the TLB entry invalidation has been performed by all processors. In conventional multiprocessor computer systems, the TLB invalidate entry instruction and associated synchronization instructions are strictly serialized, meaning that the initiating processor must complete processing each instruction (e.g., by broadcasting the TLB invalidate entry request to other processors) before beginning to process the next instruction. As a result, the processor initiating a TLB entry invalidation incurs a large performance penalty, particularly when processing instructions sequences including multiple TLB invalidate entry instructions.
The invalidation of TLB entries also adversely affects the performance of non-initiating processors. In particular, a conventional processor typically responds to a TLB synchronization request received from another processor by halting its instruction fetcher and permitting the remainder of the instructions within the processor to complete execution. After the processor's execution pipeline has completely drained of instructions, the TLB synchronization transaction is permitted to complete, and fetching of instructions is thereafter resumed. Thus, the process of invalidating TLB entries in non-initiating processors can entail several idle cycles at each stage in the processor's execution pipeline due to the suspension of instruction fetching.
In view of the performance penalty associated with TLB entry invalidations in conventional multiprocessor computer systems, the present invention recognizes that it would be useful and desirable to provide an improved method for maintaining TLB coherency in a multiprocessor computer system.