Dynamic address translation provides the ability to interrupt the execution of a program, record it and its data in auxiliary storage, such as a direct access storage device (DASD), and at a later time return the program and data to different main storage locations for resumption of execution. It can provide a user a system wherein storage appears to be larger than the main storage. This apparent storage (virtual) uses virtual addresses to designate locations therein and it is normally maintained in auxiliary storage. It occurs in blocks of addresses, called pages, and only the most recently referred-to pages are assigned to occupy blocks of physical main storage.
As the user refers to pages of virtual storage that do not appear in main storage, they are brought in to replace pages in main storage that are less likely to be needed in the near future.
Virtual addressing has become a key feature in the architecture of many large computers. Virtual addressing allows programs to appear logically contiguous to the user while not being physically contiguous in the storage system. Recently accessed portions of the virtual space are mapped into the main storage unit. The mapping information is often stored hierarchically in a directory comprising a segment table with entries corresponding to contiguous 1 MB (megabyte) segments and page tables with entries for 4 KB (kilobyte) pages within a segment.
Translation of the virtual address to a real address requires the mapping information that can be gained by searching the appropriate segment and page tables. As this searching process is time consuming, the number of full searches is reduced by retaining information for some of the recent translations in a Directory-Look-Aside-Table (DLAT). For virtual addresses covered by the DLAT, the translation process, which is required for almost every storage access, requires only a couple of machine cycles. For addresses not covered by the DLAT, the process of searching the directory ranges from about 15 to 60 cycles if the segment and page tables are in main storage. Each DLAT entry contains the information for mapping an entire page of storage, frequently 4 KB. The amount of storage covered by the DLAT depends on the number of entries in the DLAT and the size of the page.
When determining the optimal size for a page, a compromise is struck between a page large enough to amortize the overhead of swapping pages and a page small enough to incur minimal degradation due to granularity, i.e, not waste storage by committing a large page for a small object. While small (4 KB) pages may be appropriate for code segments (instructions) and small data objects, high performance scientific and engineering machines often benefit from large pages. In recent years, both data objects and the storage capacities have grown substantially; now a large page would often be more efficient. In comparison to data objects which are hundreds of megabytes, a 1 MB page offers sufficiently fine granularity.
Furthermore, with the introduction of fast, mass-storage devices in the order of one or more gigabytes (GB) such as the IBM Expanded Storage, the data transfer times for a 4 KB and 1 MB pages have been reduced to tens of cycles and thousands of cycles, respectively. With the thousands of cycles of software overhead which are incurred while resolving a page fault, resolving a 1 MB page fault may require only three to five times as long as a 4 KB page fault. For transfers of large, contiguous blocks of data, a 4 KB page system will incur the overhead hundreds of times that for page faults in a 1 MB page system.
Large pages provide many side benefits. For example, vector fetches and stores benefit from large pages simply by decreasing the number of possible page crossings. Even if the next required page is resident in physical storage, the vector pipe may be interrupted on a page crossing to verify that the page has been brought into the memory. This interruption can result in a noticeable performance degradation.
Large pages in the order of 1 MB may be useful as a possible solution for scientific applications which incur performance degradation as a result of the use of small pages. To optimize the page size for an application (or a portion of the application) multiple-size pages are desirable. To date, known DLATs are designed to handle a single page size; however to support other than a uniform page size, modifications to the current DLAT(s) would be required.
The CDC (registered trademark of Control Data Corporation) 7600 and CYBER 205 have both small and large pages. The Cyber 205 offers small (4 KB, 16 KB, and 64 KB) pages in which the page size selection must be made "via an operating system software installation parameter." Therefore, a given job can allow only one size of small page. However, a large page (512 KB) is also allowed. The CYBER translation unit handles multiple-sized pages by creating a list of "associative words". Each word contains information similar to a DLAT entry. This list is stored in main storage and the upper (most recently used) 16 entries can be loaded into an internal set of registers by instruction. The translation process consists of first searching within the internal registers. If a match (a "hit") is found, the entry is moved to the top of the list and the address is resolved. If no match is found in the first 16 entries, the rest of the list, in memory, is searched two entries at a time. If a matching entry is found, it is moved to the top of the list. Otherwise, a page fault is generated. This can require a large number of machine cycles.
Although the CYBER scheme allows a translation table to handle entries which are not of uniform size, it is not an attractive solution in the future high performance scientific processor market. The list of associative words defines all of the pages in physical storage. Although this may be practical for small storage requirements, future jobs require substantially more storage than is available on the CYBER. Some manufacturers are providing main storage of 256 MB and even 2 GB (gigabytes). These large real storages imply even greater data object sizes and larger virtual spaces to contain them. Even with 1 MB pages, a 1 GB data object would require 1000 pages; searching 2 entries per cycle could degrade performance substantially. Increasing the number of entries contained in the associative registers may be prohibitive based on the amounts of logic required.
U.S. Pat. No. 4,285,040 shows the support of multiple (two) page sizes, 128 B and 4 KB. The patented solution has many shortcomings; it is not feasible with large address spaces such as those required in high-performance scientific and engineering processors. The implementation described requires that registers be available to concurrently retain the base address values for all of the segments in the virtual address space. Even for large pages, such as 1 MB, current address spaces (2 GB) would require 2048 such registers. Furthermore, it is expected that user requirements will force the designers to allow substantially larger address spaces in the near future, thereby dramatically increasing the number of segment registers required for the approach described in the patent.
Due to the "cache-like" structure used to retain this segment information (1 MB) in the present application, the number of equivalent registers can be substantially reduced to approximately forty or fifty and still provide acceptable "hit-ratios", thereby allowing nearly optimal performance.
For accesses to data in small pages, additional storage accesses are required for the approach described in the patent. Quoting from near the top of the column labeled "3", "The subsegment descriptors are contained in a table stored in the storage of the system. Therefore, when the mechanism is operating in the second or subsegment mode, it is necessary to make an extra cycle to select and fetch one of the subsegment descriptors." This "extra cycle" is a storage reference cycle and on many processors (with large amounts of storage) this will translate to many processor cycles.
Due to the multiple "cache-like" structures of the present invention, the most recently used "page information" (subsegment descriptors) is readily available resulting in significant performance improvement. Generally, only a single search cycle is required.
Although the design described in the patent has good performance for accesses to "large" pages, the number of registers required to retain all of the segment pointers is unattractive for large address spaces. Furthermore, due to the exponentially larger number of page table entries, these entries must be stored in system storage. Therefore, for accesses to small pages, each "user" storage access results in two "real" storage accesses, one for the subsegment descriptor and one for the user data. As storage delays and storage contention contribute heavily to system performance degradation, such an approach would severely handicap a medium to high performance processor.
U.S. Pat. No. 3,675,215 describes a sequential search process which "continues with the count being incremented by one for each mismatch until the ID of the fetched entry matches the requested virtual address, or until the count exceeds the number of addresses in the subset, in which case a missing address exception occurs". A set of "chains" of translation information is maintained. The virtual address translation process consists of searching a chain. For each "page fault", an entire chain is searched to detect the occurrence of the page fault.
The DLAT access of the present application is always as fast (sometimes tens of times faster) as the chain approach in providing the information required to translate the virtual address. This is primarily due to the associative (parallel) search inherent in the DLAT structure whereas the "chain" structure requires a serial search. The minimum and maximum number of cycles for finding a translated page in the "cache-like" structure of the present application is approximately two cycles. Furthermore, on the occurrence of a page fault, recognition of the fault is dramatically quicker with the DLAT/segment/page table scheme in comparison to the chain approach.
A preferred form of the present improvement includes a set associative arrangement which is used in systems such as the 308 X and 3090 families marketed by International Business Machines, Inc. Set associative arrangements are well known; and, for example, are used in the structures described in U.S. Pat. Nos. 4,638,426 and 4,695,950. Some of the virtual address bits are used to select one set of entries; the set of entries (usually two) is then associatively searched The set associative arrangement allows fast access to a large number of entries (usually 256 to 512) but requires a small associative search (usually two or four entries). The problem in applying this method to include multiple-sized pages comes in the selection of the bits which are used to select the set of entries. If DLAT entries can cover different page sizes, the bits must be selected from those which differentiate between storage segments that are at least as large as the largest page size. However, if these bits are used to select a set of DLAT entries, when the entries are for small pages, only a small contiguous block of memory can be covered. If a two-way set associative scheme is used with both 4 kB and 1 MB pages, the congruence class must cover a contiguous segment which is at least 1 MB. However, when entries correspond to 4 KB, only 8 KB (two entries) of the 1 MB contiguous space can be covered.
The present improvement provides a DLAT structure which can handle multiple-size pages concurrently. For purposes of illustration, the two page sizes used in this description will be 4 KB and 1 MB. A 1 MB page is considered since it is the equivalent of a non-pageable segment and segments are currently part of the preferred translation process. Once the segment information has been obtained, rather than continuing the process (determining a page within a segment), an entire segment would be considered a single entity. Since segments are on 1 MB virtual boundaries, this forces the 1 MB virtual pages to be on 1 MB boundaries. This diminishes the fragmentation problem encountered in multiple-size page systems and allows simplified hardware, the low order page offset bits simply pass through unchanged.
Since the IBM 3090 is a well-known machine, its DLAT facility will be used to describe the present improvement. The DLAT is two-way set associative with 128 pairs of entries, each entry representing a 4 KB page. To allow the DLAT to cover a large contiguous portion of storage, the pair of DLAT entries are selected using the congruence class selection address bits immediately above the offset bits which define words within a page, i.e. adjacent pages are covered by different DLAT pairs. However, the 128 pairs of entries only provide coverage for a small portion (eg. 1024 KB in one arrangement) of a presumed 2 GB space.
Without loss of generality, this improvement will be described by focusing on an implementation of such a DLAT with the exception that entries for both 4 KB and 1 MB will be allowed. In place of a single 128-pair DLAT in the 3090, the improvement will use two 64-pair DLAT structures, one for 4 KB pages and one for 1 MB pages. An inherent advantage of large pages is that a single 1 MB page provides coverage for a large contiguous portion of storage with a single entry, thereby increasing the probability of a DLAT hit and eliminating the costly search through the segment and page tables.
Accordingly it is a primary object of the present invention to provide in high performance data processing systems having very large fast main storage devices a dynamic address translation mechanism which is capable of far more efficient performance than those described in the prior art.
It is a more particular object of the present improvement to provide in a translation mechanism of the type described a pair of directory-lookaside-tables which are accessed simultaneously by a virtual address presented to the translation mechanism.