In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises one or more central processing units (CPUs) and supporting hardware necessary to store, retrieve and transfer information, such as communications buses and memory. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU is the heart of the system. It executes the instructions which comprise a computer program and directs the operation of the other system components.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Programs which direct a computer to perform massive numbers of these simple operations give the illusion that the computer is doing something sophisticated. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but doing it much faster. Therefore continuing improvements to computer systems require that these systems be made ever faster.
The overall speed of a computer system (also called the “throughput”) may be crudely measured as the number of operations performed per unit of time. Conceptually, the simplest of all possible improvements to system speed is to increase the clock speeds of all of the various components simultaneously. E.g., if everything runs twice as fast but otherwise works in exactly the same manner, the system will perform a given task in half the time. Early computer systems contained processors which were constructed from many discrete components. These systems were susceptible to significant clock speed improvements by shrinking and combining components, eventually packaging the entire processor as an integrated circuit on a single chip.
Simply improving the speed of a single component will not necessarily result in a corresponding increase in system throughput. The faster component may find itself idle while waiting for some slower component most of the time.
A computer's CPU operates on data stored in the computer's addressable main memory. The memory stores both the instructions which execute in the processor and the data which is manipulated by those instructions. In operation, the processor is constantly accessing instructions and other data in memory, without which it is unable to perform useful work. In recent years, improvements to processor speed have generally outpaced improvements to the speed of accessing data in memory. The time required to access this data is therefore a significant factor affecting system throughput.
Nearly all modern computer systems use some form of virtual addressing, in which an address in a relatively large address space associated with one or more software processes is translated to an address in a relatively smaller address space associated with memory. The former are referred to herein as “virtual addresses”, although they are known in some architectures as “effective addresses” or by other names. The latter are referred to herein as “real addresses”, although they may also be known as “physical addresses”, “memory addresses” or by some other name. In some architectures, multiple levels of addresses of the former type may exist and/or multiple levels of the latter type may exist. However, the fundamental distinction between the former and the latter is that virtual addresses, however named, have no permanent or persistent correspondence to actual locations in the computer system's memory, while real addresses do. I.e., each real address corresponds to a respective location in the physical hardware memory of the computer system, and maintains this correspondence in a persistent manner as different software processes are initiated and terminated (although in some architectures, it may be possible to change the correspondence by re-configuring the system, adding or removing memory, or similar events). The correspondence between a virtual address and physical memory is ephemeral, and can change as new pages are brought into physical memory from storage and other pages are removed from memory. At any instant in time, most virtual addresses typically have no corresponding assignment in physical memory, i.e, the data at that address is either unallocated or held in storage, but not in main memory.
Modern systems use virtual addressing for the simple reason that modern software processes typically require larger address spaces than are practical to implement in physical memory. When an executing process requires access to a range of virtual addresses, that range is temporarily assigned a corresponding range of real addresses (i.e., locations in physical memory). The assignment is necessarily temporary because there are not enough real addresses to go around. The real addresses will eventually be assigned to some other range of virtual addresses.
Computer systems typically use a mechanism called a page table to record the temporary assignments of virtual addresses to real addresses, a “page” being the smallest unit of address assignment. Although referred to as a “page table”, this mechanism may have a more complex structure. When the processor generates a virtual address to which an executing process requires access, an address translation mechanism references the page table to determine the correct real address corresponding to a given virtual address, i.e., to translate a virtual address to a real address.
Translation of a virtual address to a real address is a critical component of memory access. In many systems, translation look-aside buffers (which are effectively caches of address translation data derived from the page table) or similar mechanisms are used to assist translation, but for at least some translations it is necessary to access the page table itself. The operational characteristics of the page table and its associated address translation mechanisms are significant contributors to overall system performance.
Conventionally, page table mechanisms have typically followed one of two design approaches. In a first approach, referred to herein as a direct mapped table, the page table contains one entry for each page of addresses in the virtual address space, this entry containing a high-order portion of the corresponding real address (the low-order portion being copied from the corresponding low-order bits of the virtual address). In a second approach, referred to herein as a hashed table, the page table contains substantially fewer then one entry for each page of address in the virtual address space, the entries being accessed by some less direct mechanism, such as hashing some part of the virtual address. In the second approach, each entry in the page table not only contains the high-order portion of the real address, but some portion of the virtual address as well. This portion of the virtual address must be compared with the original virtual address to verify that the entry from the hashed table in fact corresponds to the desired virtual address (and not some other virtual address).
The direct mapped page table is conceptually simpler, but requires a large amount of memory to hold the entire page table. Where the virtual address space is far larger than the real address space and is very sparsely allocated, as is commonly the case in modern systems, a lot of space may be consumed by null page table entries. This problem tends to become more acute as software processes use larger and larger virtual address spaces. Inefficient use of memory space can affect performance, because it becomes difficult to store sufficient portions of the page table in cache, or even in main memory, which increases the access time to the page table itself.
The hashed table uses memory more efficiently, but the complexity of the translation mechanism often means that translations are slow, and the need to store a virtual address increases the size of each entry. For example, consecutive entries in a hashed table typically do not correspond to consecutive pages in the virtual address space. In many processes, there is a need to access consecutive pages or otherwise a locality of memory reference. Conventional hashed page tables typically require independent translations for such consecutive pages. At the same time, it is desirable to store the page table in cache, and the size of entries in the virtual page limits the number of entries in each cache line
A need exists for improved techniques for translating a virtual address to a real address in a computer system. In particular, a need exists for an improved page table access and translation mechanism, which avoids the excessive memory consumption of the direct mapped approach while obtaining at least some of its advantages.