The present invention generally relates to shared-memory multiprocessor systems, such as IBM ESA/390 or RS/6000 systems, and deals more particularly with a method and system for sharing one translation lookaside buffer (TLB) between several CPUs.
The main memory is considered the physical memory in which many programs want to reside. However, due to the limited size of a physical memory in a computer system not all programs can be loaded simultaneously. The virtual memory concept was introduced to alleviate this problem. The basic idea of said prior art approach is to expand the use of the physical memory among many programs with the help of an auxiliary (backup) memory such as disk arrays. All programs can be loaded in and out of the physical memory dynamically under the coordination of the operating system. To the users, virtual memory provides them with almost unbounded memory space to work with. In a process called xe2x80x98address translationxe2x80x99 such virtual addresses are transformed into physical addresses, which uniquely define physical locations in the main memory at run-time.
Both, the virtual and physical memory is partitioned into fixed-length pages of usually 4 kilobytes. When a translation for a virtual page is used, it will probably be needed again in near future of the program run, because the references to the words on that page have both temporal and spatial locality. Accordingly, modern machines includes a special cache that keeps track of recently used translations. This special address translation cache is further referred to as translation-lookaside buffer, or TLB.
State of the art micro processors have already all basic functional units of the processor such as arithmetic logic unit, floating point unit, TLB, first-level cache, etc. integrated on a single chip and it can be foreseen that the next processor generation will have two or several independent operating processors on a single chip. Not all functional units are required to be dedicated to a particular CPU and can be shared between different CPUs.
The sharing of functional units between CPUs is a common practice for second level caches, while the first level cache with a one cycle access time is dedicated to a particular CPU and thus provides optimal performance, but the one cycle access requirement limits the size of the array to less than 128 kilobyte for state of the art processors. The second level cache with a capacity of several megabyte is shared between CPUs thereby offering a better utilization of the array and even more, if several CPUs access so-called common memory space, e.g., in case of a read-only source code of a compiler, one and the same data portions buffered in the second level cache is available for different CPUs.
The arguments which are valid for implementation of a shared second level cache apply also for a shared second level TLB, further on called shared TLB2, because all data in the shared cache are accessed using absolute addresses, while the shared TLB2 buffers the mapping of virtual to absolute addresses.
Thus, there is a general need for sharing a TLB between several CPUs for improving the performance and reducing the chip area required to buffer the results of virtual to absolute address translations.
With reference to FIG. 1a prior art implementation of a 4-way set-associative cache used as a TLB in address translation is described in more detail in order to introduce the TLB architecture details needed to understand the concepts of the present invention.
A 32 bit virtual address 10 abbreviated further as VA is an object of the address translation. Bits 12 to 19 of it are used as an index in order to address a specific row in each of the 4 compartments of the TLB. Bits 0 to 11 are compared in comparators 12 with the tag field 14 of the associated row.
The comparators 12 determine which element of the selected compartment matches the tag. The output of the comparators is used to select the data 16 from one of the four indexed compartments, using a multiplexor 18.
The IBM ESA/390 and ESAME CPU architecture are taken as reference to explain the architectural requirements for sharing a TLB2 between different CPUs. Although these architectures don""t explicitly prohibit the implementation of a shared TLB2, it is obvious that all rules valid for forming TLB entries for a dedicated TLB must also be obeyed for a shared TLB2, i.e. a shared TLB2 must be transparent as seen from the architecture point of view.
The formation of TLB entries is only permitted with the use of translation tables attached to a particular CPU.
This rule was established because a particular CPU, which has purged its dedicated TLB from all entries and has dynamic address translation disabled and is in the process to set up new translation tables, should not get access to translations set up by another CPU by means of a shared TLB2. Instead, it should only get translations, which are built with its own attached tables.
Special rules also apply if one particular CPU purges all entries in its dedicated TLB, then all shared TLB2 entries must be purged, too, but entries shared by other CPUs should remain valid.
Another problem arises if a TLB entry is manipulated by a process called prefixing. Prefixing assigns a unique prefix address to a translation result of zero, because page address zero contains various architected data values dedicated to a particular CPU. In a multiprocessor system with shared memory, each CPU has an unique prefix register, because xe2x80x98page zeroxe2x80x99 is only one time available in main memory. Therefore, TLB entries prefixed by a CPU A are not to be used by a CPU B.
As can be appreciated now by a person skilled in the art, because of the above implications, a shared TLB2 was never realized.
It is an object of the present invention to provide a method and system for sharing a TLB2 between CPUs which is transparent to the CPU architecture and thus in compliance with the architecture rules.
The inventive TLB2 organization comprises several small arrays dedicated to particular CPUs, providing an interface to a major array, which is shared between CPUs. The dedicated arrays are required to fulfill the architected constraints and link several CPUs to the commonly used shared array.
According to its primary aspect the present invention provides a method for operating a second level Translation Lookaside Buffer (TLB) in a Symmetric MultiProcessor (SMP) system which is characterized by the steps of:
a. using a respective plurality of processor memory areas further referred to herein as CRTs uniquely dedicated to each of said multiple processors for storing virtual address data and an origin pointer, e.g., the page table origin (PTO), in order to locate the absolute address associated with said virtual address,
b. using a common memory area, further referred to as PTE shared between said processors for storing at least said absolute address asociable with a virtual address stored in any of said plurality of processor memory areas,
c. defining a TLB hit on a virtual address applied by any of said processors by
d. checking if subaddress data, e.g., the segment index of the virtual address of said applied virtual address matches with respective subaddress data stored in said common memory area, and
e. checking if the respective entries of the processor memory area and the common memory area are flagged xe2x80x98validxe2x80x99.
Further, when said subaddress data is the segment index of a virtual address, and a plurality of least significant bits of the virtual address is stored as a tag data together with the absolute address in an entry of said common memory area an efficient implementation of the inventional concept is provided.
Further, when performing a concurrent lookup in both, the processor and the common memory area the TLB2 is effectively sharable between CPUs.
Further, when providing a fixed number of processor memory areas associated to a respective plurality of n-set associative storage elements in the common memory area according to the sequence of said processor memory areas, then, an area-saving way to organize the inventive TLB2 is found.
A Symmetric MultiProcessor (SMP) hardware unit, e.g., a chip can advantageously take profit from an implementation performing the inventive method according to one of the before-mentioned aspects.