1. Field of the Invention
The invention relates to apparatus, and accompanying methods for use therein, for translating virtual memory page addresses to real page addresses and specifically for increasing the speed of such translations by translating multiple contiguous virtual page addresses upon the occurrence of a miss in a translation lookaside buffer (TLB).
2. Description of the Prior Art
Most modern computer systems, particularly mainframe computers, employ high speed random access (RAM) memory circuits as main memory and relatively slow mass memory devices, such as hard disk or magnetic tape drives, as auxiliary (mass) storage. The disparity in access times between random access memory and disk or tape drives is substantial and severe, with the former having access times ranging on the order of tenths of microseconds or less while the latter has access times often ranging on the order of at least tens of milliseconds. Given this disparity, user programs are not executed from auxiliary storage, but rather are transferred therefrom into main memory for execution therein.
In practice, considerations of cost, physical circuitry size and/or power requirements frequently limit the amount of RAM memory that is used in implementing a main memory to a finite real address space which is often substantially less than the maximum address space of a processor that is to access this memory. For example, a processor that operates with a 31 bit virtual address word, which inherently possesses the capability of separately addressing 2.sup.31 (over 2 billion) bytes, may often operate with as little as a few Mbytes of actual RAM memory. To provide sufficiently rapid execution speeds, the available RAM memory must be shared among all current user programs that are executing on the processor as well as with a resident portion of the operating system used by the processor. Unfortunately, the RAM memory is rarely, if ever, sized sufficiently large to fully accommodate all the instructions and data that form each such user program and the resident portion of the operating system.
However, it has been recognized quite early in the art that, that through normal operation of instruction fetches and stack and data accesses and standard programming techniques, most program instructions possess a rather good spatial locality of reference. This means that at memory location x in an executing user program, that program will exhibit a strong tendency to interact within relatively small time delays with different but nearby memory locations, such as locations x+1, x+2 and so on. This behaviour, often involving preceding instructions, e.g. locations x-1, x-2 and so on, is clearly evident in loops and other similar program structures. Although the organization of external data is often not as constrained by the architecture of the processor as are the stack and instruction accesses, such data, particularly arrays, are stored in contiguous memory locations and, as such, often exhibit considerable spatial locality. In this regard, certain programmed operations, such as illustratively clearing, transposing, adding or array multiplication, that at any instance utilize one element of an array will likely access other elements of the array within a short time. Similarly, the art has recognized that instructions and data often exhibit a good temporal locality of reference as well, i.e. where the same memory location is repeatedly accessed over time.
Given these recognitions regarding spatial and temporal localities, the art has turned to and now widely uses a number of memory techniques that attempt to share a relatively small amount of real memory among a number of currently executing user programs, each of which is capable of addressing a much larger memory space.
One such technique is paging. Here, in essence, different finite portions, i.e. "pages", of memory data (collectively including both instructions and data values) for each user program, rather than all the memory data for that program, are successively copied ("swapped") from auxiliary storage into main memory and then used for current execution. Owing to spatial and temporal localities, the main memory contains pages of memory data that not only possess memory locations that have just been recently accessed but also locations that are expected to be subsequently accessed within a very short delay time. With a well designed paging system, the vast majority of memory access time should be spent accessing memory data located within pages previously copied into main memory with relatively little access time being spent in copying new pages of memory data from auxiliary storage.
Specifically, whenever the processor attempts to access memory while executing a user program, the processor issues a so-called "virtual address" for a desired memory datum that is to be accessed. The size of the virtual address is generally only limited by the maximum address space of the processor that is allowed for program usage. By contrast, a so-called "real" or "physical" address is used to directly access memory in order to locate the desired memory datum stored therein. Since the virtual address of any given memory datum is not necessarily the same as its corresponding real address, a translation facility, provided by the operating system and generally transparent to any executing user program, translates each virtual address issued by the processor to a corresponding real address prior to accessing main memory in order to obtain this datum.
Both virtual and real memory space are divided into fixed sized areas or segments, each of which is, in turn, divided into a number of contiguous pages. Each page is formed of a predefined number of memory locations, typically ranging between 2 to 4K bytes. Though pages for any program are contiguous in virtual memory; the corresponding physical pages for that program, being swapped into and out of main memory as required by the operating system during on-going program execution, tend to be randomly scattered throughout main memory. A physical page in main memory is often referred to as a "page frame".
The random location of page frames in main memory necessitates that the operating system maintains address translation, specifically and illustratively segment and page, software tables and an address translation process which utilizes these tables for use in translating virtual to real addresses. These tables and the translation process collectively form the address translation facility. For each virtual page copied from auxiliary storage as a page frame into main memory, the address translation tables store its virtual page address along with its corresponding page frame address. Inasmuch as memory locations within any page, whether virtual or real, are contiguous, then through these tables, a virtual address located within such a virtual page can be mapped into a physical address of a location residing in main memory.
Given this virtual addressing scheme, whenever the processor attempts a memory access for a given memory datum during execution of a user program, the processor issues a virtual address for that datum. The datum may currently reside in main memory or it may not. If the datum resides in the main memory, the virtual to real address correspondence for that datum exists in the page and segment tables. As such, the address translation process, upon accessing these tables, extracts the physical address of the datum and thereafter applies this address to the main memory. Once this datum has been accessed, user program execution proceeds accordingly.
If, however, the desired datum does not currently reside within the main memory because a page containing that datum has not yet been swapped into main memory, then no valid entry for its associated virtual page exists in the page and segment tables. As such, the datum must be retrieved from the auxiliary store. Accordingly, the address translation process, upon accessing these tables using that virtual address, produces a page fault. At this point, interpretation of a current instruction (which caused the page fault) halts, the current state of processor is saved and the processor transfers execution to a software page fault handler. Rather than accessing and copying only the desired datum from auxiliary storage, the page fault handler translates the incoming virtual page address and then, through input/output controller(s) for an appropriate mass storage device(s), copies an entire page containing that desired datum from auxiliary storage as a page frame into main memory. Thereafter, the fault handler updates the segment and page tables accordingly with new corresponding virtual and real addresses for this page. Execution then returns from the fault handler to the address translation process which, in turn, accesses the desired datum from the newly copied page. When appropriate, the fault handler, as well as other well known components of the operating system, will subsequently resume execution of the current program instruction that generated the page fault.
If pages are correctly sized, then the spatial and temporal localities inherent in most programs within a given virtual page should be very high. Hence, one would then expect that the average access time to a paged memory system consisting of a main memory having a few Mbytes of RAM, an auxiliary store containing a few billion bytes of disk memory and with correctly sized pages should only be slightly longer than but nevertheless remain on the same order as that of the RAM memory. However, disadvantageously and under various circumstances, the access time to such a paged memory system can lengthen considerably.
In particular, translating a virtual address, even if accomplished through use of microcode rather than purely software, often requires several memory accesses into the software based segment and page tables as well as a number of other well known processing steps that collectively may consume upwards of typically 5 to 100 machine cycles to fully accomplish. For that reason, the address translation process tends to be relatively slow and add significant overhead to memory access and instruction execution times. Hence, if the full address translation process were to be performed for every virtual address, then these times would be slowed considerably which, in turn, would substantially decrease the throughput of the computer. Therefore, in an attempt to substantially eliminate the need to perform the entire address translation process for every virtual address, the art has turned to the use of translation lookaside buffers (TLBs). These buffers, in hardware, store recently used virtual page addresses along with their corresponding page frame addresses. Once a virtual page address has been fully translated, this virtual address along with its corresponding physical page frame address are stored in a common entry in the TLB for subsequent use.
TLBs also exploit locality inasmuch as a significant likelihood often exists that after a first memory access has occurred to a virtual address associated with a location within a given page frame, subsequent accesses to virtual locations associated with and located within the same page will occur after a relatively short delay time. Since the relative position of a virtual location within a page of virtual memory is the same as the relative position of a corresponding physical location within a corresponding page frame in physical memory, the same TLB entry can be used in translating any virtual location within a given virtual page to a physical location within a corresponding page frame. As such, low order virtual address bits are merely appended as low order real address bits onto the accessed real page address generated from the TLB to generate a real memory address. For further insight into paged memory systems including the use of TLBs, see S. A. Ward et al, Computation Structures (.COPYRGT.1990, MIT Press/McGraw-Hill Book Co.) pages 486-497 and D. A. Patterson et al, Computer Architecture--A Quantitative Approach (.COPYRGT.1990: Morgan Kaufmann Publishers), pages 432-438.
Consequently, when a TLB is in use, then prior to performing full address translation, the address translation process determines whether an incoming virtual page address resides within the TLB. If so, the corresponding page frame address is accessed therefrom and then used to form a real memory address which, in turn, is used to access main memory. If not, full address translation occurs, and a new entry in the TLB is created for the latest virtual page address and its corresponding page frame address. Hence, only those virtual addresses that have virtual page addresses which are not stored in the TLB are fully translated. Advantageously, this, in turn, drastically reduces the number of full address translations that must occur during program execution.
To facilitate rapid address translation, TLBs are typically implemented in hardware as a hashed set-associative table stored in a high speed memory--rather than in software as are page and segment tables. During operation, a TLB typically writes a new entry into a buffer location occupied by the least recently used entry in the TLB. Through the use of address hashing and parallel compares, particularly when performed in dedicated high speed TLB hardware, a single entry in a TLB can be accessed very quickly--often in less than a single machine cycle.
Given the substantial improvement in translation and overall memory access speed gained through the use of a TLB, the art has turned to several techniques aimed at increasing the speed at which a TLB operates. These techniques are typified by that described in, for example, U.S. Pat. No. 4,695,950 (issued to H. R. Brandt et al on Sep. 22, 1987) which discloses the concept of storing intermediate translations using a TLB during a double level address translation and U.S. Pat. No. 4,638,426 (issued to A. Chang on Jan. 20, 1987) which discloses the concept of translating a virtual address into a real address using an intermediate virtual address. Unfortunately, the TLB based addressing schemes disclosed in these patents, as well as that described above and known in the art, possess serious drawbacks.
Being hardware based, TLBs are finite in size. Generally, a TLB contains anywhere between typically 64 to 1024 separate entries with each entry containing approximately 64 bits. As such, a TLB can only store a certain number of the most recently translated virtual page addresses. Owing to the limited size of a TLB, various user programs that operate on large amounts of data, such as large matrices, routinely trigger successive TLB misses. The occurrence of a TLB miss causes the full address translation process, including accessing the segment and page tables and updating the TLB, to be performed along with its concomitant processing delays. If these misses occur frequently enough, then the efficiency gained through the use of a TLB will seriously degrade. Large matrices and other similar data structures engender a large number of TLB misses, therefore producing a large TLB miss ratio and significant processing delays.
In an attempt to counter the loss of performance resulting from processing data that has a high TLB miss ratio, one of two well-known techniques are frequently used. First, one technique calls for greatly increasing the size of the TLB to a size which is believed to significantly reduce the likelihood that a large number of TLB misses will occur. Inasmuch as a TLB is usually located in a critical path in a computer where accessing delays can not be tolerated, a TLB is necessarily implemented with high speed memory circuits which are generally very expensive. Hence, if the size of a TLB were to be greatly increased, then its cost would rise appreciably. In addition, this technique may be ineffective with certain processing applications, such as processing very large matrices or other similar data structures, which even with realistically large TLBs will nevertheless produce a significant number of TLB misses and resulting processing delays caused by translation overhead. A second technique calls for significantly increasing the page size, from, for example, 4K bytes to 1M byte, in an effort to greatly reduce the number of translations and associated overhead that are expected to occur. However, as the page size increases, an increasing amount of memory data must be swapped into and out of main memory during paging but only a small amount of memory data in any one page is usually being accessed at any one time by the processor. As such, increasing the page size causes increasingly inefficient memory usage. Though supercomputers often employ a large page size, the resulting memory inefficiencies effectively preclude use of this technique in a general purpose computer.
Thus a need exists in the art for a technique, particularly suited for use in, though not exclusively limited to, a general purpose computer, for increasing the efficiency of a TLB used in such a computer particularly when that computer is processing programs and/or accessing data which would otherwise generate a high TLB miss ratio. Advantageously, such a technique should not require the use of TLBs of significantly increased size or use of a relatively large page size.