1. Technical Field
The present invention relates generally to hardware and system software of computer systems, and deals more particularly with a method and. system for improving the performance of translation lookaside buffers during address translation.
2. Prior Art
Virtual memory techniques including the provision of virtual code addresses are one of the basic concepts alleviating the job of application programmers in that they need not worry about the physical locations where code could be placed in memory when the program is loaded in order to be run.
A nearly unlimited virtual address space is provided thereby for the programmer""s activities. In a process called xe2x80x98address translationxe2x80x99 such virtual addresses are transformed into physical addresses which uniquely define physical locations in the main memory at run-time.
In virtual memory, the address is broken into a virtual page number and a page offset. When translated into physical memory quantities the physical page number constitutes the upper portion of the page""s physical address, while the page offset, which is not changed, constitutes the lower portion. The number of bits in the page offset field determines the page size. All the pages are managed in page tables.
Page tables are so large that they must be stored in main memory. This means that every memory access takes at least twice as long: one memory access to obtain the physical address and a second access to get the data. The key for improving access performance is to rely on locality of reference to the page table: When a translation for a virtual page number is used, it will probably be needed again in the near future of a program run, because the references to the words on that page have both temporal and spatial locality. Accordingly, modern machines include a special cache that keeps track of recently used translations. This special address translation cache is further referred to as a translation-lookaside buffer, or TLB.
Computer systems of the high end range like IBM ESA390 or ESAME system are equipped with an increasing amount of mainstorage in order to reduce the number of accesses to external storage devices.
The increasing mainstorage, however, necessitates larger TLBs, which hold the virtual to absolute translated addresses. A larger TLB, however, has a longer access time and this time adds to the instruction cache or data cache access time, respectively.
As a result, while a performance gain is seen having a larger mainstorage and caches, the performance might be degraded by the longer access time of the TLB.
For high end systems this problem could be solved by using a second level TLB in a similar way as a second level cache acts to support a first level cache. In such an arrangement the first level TLB further referred to herein as TLB1 would be kept small and would have a short access time while a second level TLB further referred to herein as TLB2 would be desired to have approximately 10 times more entries and would be arranged to feed the TLB1 if a required translation is available in the TLB2 but not in TLB1. If, however, a TLB2 would be structured like a TLB1, and assuming the TLB2 is 10 times larger, an array access could not be done within one cycle, thus any performance gain would be lost.
It is thus the object of the present invention to provide a TLB structure or TLB arrangement which is adapted to large address space, i.e. greater than or equal to 64 bit addresses and concurrently avoiding performance loss based on larger access time due to larger TLB structures forcedly implied by the more complex translation of larger addresses.
It is a further object of the present invention to keep in addition to the final translation also intermediate address translation values.
The basic idea comprised of the present invention is to provide a translation lookaside buffer (TLB) arrangement which advantageously uses two buffers, a small first level TLB1 and a larger second level TLB2. The second level TLB feeds address information to the first level TLB when the desired virtual address is not contained in the first level TLB. According to the invention the second level TLB is structured advantageously comprising two n-way set-associative sub-units of which one, a higher level unit covers some higher level address translation levels and the other one, a lower level unit, covers some lower level translation level. According to the present invention some address information holds some number of middle level virtual address (MLVA) bits, i.e. 8 bits, for example in case of 64 bit addresses, being able to serve as an index address covering the address range of the higher level sub-unit. Thus, the same information is used as a tag information in the lower-level sub-unit and is used herein as a quick reference in any look-up operation in order to find the absolute address of the concerned virtual address translation. Further, the commonly used status bits, like e.g. valid bits, are used in both TLB structures, too.
As an advantage the output of the higher-level sub-unit is a valid page table origin when a match is found for the higher address bits and a valid entry was built before. Thus the absolute physical address can be found very quickly. As in some processor architecture several translation table fetches are necessary, e.g. IBM mainframe ESAME requires five fetches to translate a 64-bit address and as the address change is in the lowest and last one used table, i.e., in the page table, the start address of it will be saved, i.e. the page table origin, further referred to herein as PTO and can be used again, if this page table is required. Thus, all accesses to the higher-level translation tables, as e.g. segment, region tables etc. are bypassed which is a considerable performance gain. With this feature the start address of the page table can be found within one cycle and can be used for the last table-access to get the absolute address.
As an additional advantageous feature of the present invention there can be advantageously provided a LRU-mechanism in the higher-level sub-unit in order to fill up the higher-level sub-unit compartments equally, This serves to increase the efficiency of the TLB arrangement.
A further advantage is the saving of chip area required to implement the aforementioned TLB arrangement: The PTE RAM contains only the absolute address and the valid bit, but the address tag data and the table root pointer are located in the CRSTE, thus are provided commonly for several PTE entries.