The present invention relates to microprocessor and computer systems, and more particularly, to virtual memory systems with extended linear address generation and translation.
Most microprocessors make use of virtual or demand-paged memory schemes, where sections of a program""s execution environment are mapped into physical memory as needed. Virtual memory schemes allow the use of physical memory much smaller in size than the linear address space of the microprocessor, and also provide a mechanism for memory protection so that multiple tasks (programs) sharing the same physical memory do not adversely interfere with each other.
Physical memory is part of a memory hierarchy system, which may be illustrated as part of a computer system shown in FIG. 1. Microprocessor 102 has a first level cache comprising instruction cache 104 and data cache 106. Microprocessor 102 communicates with unified second level cache 108 via backside bus 110. Second level cache 108 contains both instructions and data, and may physically reside on the chip die 102. Caches 104 and 106 comprise the first level of the memory hierarchy, and cache 108 comprises the second level.
The third level of memory hierarchy for the exemplary computer system of FIG. 1 is indicated by memory 112. Microprocessor 102 communicates with memory 112 via host processor (front side) bus 114 and chipset 116. Chipset 116 may also provide graphics bus 118 for communication with graphics processor 120, and serves as a bridge to other busses, such as peripheral component bus 122. Secondary storage, such as disk unit 124, provides yet another level in the memory hierarchy.
FIG. 2 illustrates some of the functional units within microprocessor 102, including the instruction and data caches. In microprocessor 102, fetch unit 202 fetches instructions from instruction cache 104, and decode unit 206 decodes these instructions. For a CISC (Complex Instruction Set Computer) architecture, decode unit 206 decodes a complex instruction into one or more micro-instructions. Usually, these micro-instructions define a load-store type architecture, so that micro-instructions involving memory operations are simple load or store operations. However, the present invention may be practiced for other architectures, such as for example RISC (Reduced Instruction Set Computer) or VLIW (Very Large Instruction Word) architectures.
For a RISC architecture, instructions are not decoded into micro-instructions. Because the present invention may be practiced for RISC architectures as well as CISC architectures, we shall not make a distinction between instructions and micro-instructions unless otherwise stated, and will simply refer to these as instructions.
Most instructions operate on several source operands and generate results. They name, either explicitly or through an indirection, the source and destination locations where values are read from or written to. A name may be either a logical (architectural) register or a location in memory. Renaming logical registers as physical registers may allow instructions to be executed out of order. In FIG. 2, register renaming is performed by renamer unit 208, where RAT (Register Allocation Table) 210 stores current mappings between logical registers and physical registers. The physical registers are indicated by register file 212.
Every logical register has a mapping to a physical register in physical register file 212, where the mapping is stored in RAT 210 as an entry. An entry in RAT 210 is indexed by a logical register and contains a pointer to a physical register in physical register file 212. Some registers in physical register file 212 may be dedicated for integers whereas others may be dedicated for floating point numbers, but for simplicity these distinctions are not indicated in FIG. 2.
During renaming of an instruction, the current RAT provides the required mapping for renaming the source logical register(s) of the instruction, and a new mapping is created for the destination logical register of the instruction. This new mapping evicts the old mapping in the RAT.
Renamed instructions are placed in instruction window buffer 216. All instructions xe2x80x9cin-flightxe2x80x9d have an entry in instruction window buffer 216, which operates as a circular buffer. Instruction window buffer 216 allows for memory disambiguation so that memory references are made correctly, and allows for instruction retirement in original program order. (For CISC architectures, a complex instruction is retired when all micro-instructions making up the complex instruction are retired together.)
For an instruction that writes its result to a memory location, data cache 106 (part of the memory hierarchy) is updated upon instruction retirement. For an instruction that writes its result to a logical register, no write need be done upon retirement because there are no registers dedicated as logical registers. (Physical register file 212 has the result of the retiring instruction in that physical register which the destination logical register was mapped to when the instruction was renamed.)
Scheduler 218 schedules instructions to execution units 220 for execution. For simplicity, only memory execution unit 224 is explicitly indicated in execution units 220. A load or store instruction is dispatched by scheduler 218 to AGU (Address Generation Unit) 222 for computation of a linear address, and memory execution unit 224 translates the linear address into a physical address and executes the load or store instruction. Memory execution unit may send data to or receive data from a forwarding buffer (not shown) rather than data cache 106, where a forwarding buffer stores objects that may eventually be written to data cache 106 upon instruction retirement. The scheduling function performed by scheduler 218 may, for example, be realized by reservation stations (not shown) implementing Tomasulo""s algorithm (or variations thereof) or by a scoreboard. Execution units 220 may retrieve data from or send data to register file 212, depending upon the instruction to be executed.
In other embodiments of the present invention, the information content contained in the data structures of physical register field 212 and instruction window buffer 216 may be realized by different functional units. For example, a re-order buffer may replace instruction window buffer 216 and physical register file 212, so that results are stored in the re-order buffer, and in addition, registers in a register file are dedicated as logical registers. For this type of embodiment, the result of an instruction that writes to a logical register is written to a logical register upon instruction retirement.
With most modern computer systems, a microprocessor refers to a memory location by generating a linear address, but an object is retrieved from a specific memory location by providing its physical address on an address bus, such as bus 114 in FIG. 1. Linear addresses may be the same as physical addresses, in which case address translation is not required. However, usually a virtual memory scheme is employed in which linear addresses are translated into physical addresses. In this case, a linear address may also be referred to as a virtual address. The linear address space is the set of all linear addresses generated by a microprocessor, whereas the physical address space is the set of all physical addresses.
For some microprocessor architectures, such as Intel(copyright) Architecture 32 bit (IA-32) microprocessors (Intel(copyright) is a registered trademark of Intel Corporation, Santa Clara, Calif.), there is also another type of address translation in which a logical address is translated into a linear address. For these type of architectures, the instructions provide logical address offsets, which are then translated to linear addresses by AGU 222 in FIG. 2. This extra stage of address translation may provide additional security, e.g., where application code cannot modify supervisory (operating system) code.
The mapping of a logical address to a linear address is illustrated in FIG. 3. A logical address comprises segment selector 302a and offset 304. Segment selector 302a is stored in segment register 302, which also contains descriptor cache 302b. Segment selector 302a points to segment descriptor 308 in descriptor table 306. Descriptor table 306 provides a table of segment descriptors stored in memory. A segment descriptor provides a segment base address, so that a linear address is obtained by adding an offset to the base address provided by a segment descriptor, as indicated by summation 312. In addition to providing a base address, a segment descriptor contains various other types of information, such as access rights and segment size. The base address, access rights, segment size, and other information, is cached in descriptor cache 302b. 
A virtual or demand-paged memory system may be illustrated as a mapping between a linear (virtual) address space and a physical address space, as shown in FIG. 4. In a virtual memory system, the linear and physical address spaces are divided into blocks of contiguous addresses, customarily referred to as pages if they are of constant size or are any of several fixed sizes. A typical page size may be 4KBytes, for example.
The mapping shown in FIG. 4 illustrates a generic two-level hierarchical mapping comprising directory tables and page tables. Page directory tables and page tables are stored in physical memory, and are usually themselves equal in size to a page. A page directory table entry (PDE) points to a page table in physical memory, and a page table entry (PTE) points to a page in physical memory. For the two-level hierarchical mapping of FIG. 4, a linear address comprises directory field 402, table field 404, and offset field 406. A directory field is an offset to a PDE, a table field is an offset to a PTE, and an offset field is an offset to a memory location in a page.
In FIG. 4, page directory base register (PDBR) 408 points to the base address of page directory 410, and the value stored in directory field 402 is added to the value stored in PDBR 408 to provide the physical address of PDE 412 in page directory 410. PDE 412 in turn points to the base address of page table 414, which is added to the value stored in table field 404 to point to PTE 416 in page table 414. PTE 416 points to the base address of page 418, and this page base address is added to the value stored in offset 406 to provide physical address 420. Linear address 422 is thereby mapped to physical address 420.
Accessing entries stored in page directories and page tables require memory bus transactions, which can be costly in terms of processor cycle time. However, because of the principle of locality, the number of memory bus transactions may be reduced by storing recent mappings between linear and physical addresses in a cache, called a translation look-aside buffer (TLB). There may be separate TLBs for instruction addresses and data addresses. Entries in a TLB are indexed by linear addresses. A hit in a TLB provides the physical address associated with a linear address. If there is a miss, then the memory hierarchy is accessed, sometimes referred to as a page walk, as indicated in FIG. 4 to obtain the translation of a linear address into a physical address.
Some IA-32 microprocessors employ several modes for translating linear addresses into physical addresses, and we shall consider three such modes herein referred to as modes A, B, and C. Mode A supports a 32 bit physical address space with 4 KB page sizes. Mode B supports a 32 bit physical address space with either 4 KB or 4 MB page sizes. For modes A and B, the page and directory table entries are each 4 bytes. Mode C supports a 36 bit physical address space for a physical address size of 64 GB (physical address extension) with either 4 KB or 2 MB page sizes. For mode C, the page and directory table entries are each 8 bytes. For each mode, the page and directory tables are equal in size to a page. All modes are for translating 32 bit linear addresses.
Mode A is illustrated in FIG. 5, where the first 12 bits of a linear address are used as an offset to a physical address within a page frame, the next 10 bits of the linear address are used as an offset into a page table, and the highest 10 bits of the linear address are used as an offset into a page directory. For example, in FIG. 5, PTE 502 in page table 504 pointed to by table field 506 of the linear address provides the address of the desired page frame in physical memory, and when concatenated with offset 508 of the linear address provides the physical address of the desired object. The PDBR register, page directory entries, and page table entries each provide the upper 20 bits of a 32 bit address, so that page directories, page tables, and pages are each forced to be aligned on 4 KB boundaries.
Mode B for 4 MB page sizes is illustrated in FIG. 6. (For 4 KB page sizes, mode B is similar to mode A. The first 22 bits of the linear address provides the offset into a physical 4 MB page frame, and the highest 10 bits of the linear address provides the offset into a page table. Note that mode B with 4 MB page sizes requires only one level of address translation. A PDE in the page directory of FIG. 6 provides the upper 10 bits of a 32 bit address to force pages to be aligned on 4 MB boundaries.
Mode C for 4 KB page sizes is illustrated in FIG. 7. This involves a third level of address translation provided by page directory pointer table (PDPT) 702. Each entry in PDPT 702 is 8 bytes, and there are 4 entries in a PDPT. PDBR 704 provides the upper 27 bits of a 32 bit address pointing to the base of a PDPT so that PDPTs are forced to be aligned on 32 byte boundaries. Each entry in the PDPT, page directory, and page table provides the upper 24 bits of a 36 bit address so that page directories, page tables, and pages are forced to be aligned on 4 KB boundaries.
Mode C for 2 MB page sizes is illustrated in FIG. 8. Only two levels of address translation are required, where again a four entry PDPT is used to point to a page directory. Entries in the page directory provide the upper 15 bits of a 36 bit address so that pages are forced to be aligned on 2 MB boundaries.
The page structure described in FIGS. 7 and 8 for Mode C allows up to 4 GB of the 64 GB extended address space to be addressed at one time. To address other 4 GB sections of the extended address space, a different entry may be placed in the PDBR register so as to point to a different PDPT, or entries in the PDPT may be changed. Further details of address translation for the IA-32 architecture may be found in the Intel Architecture Developer""s Manual for the Pentium(copyright) Pro, Vol. 3, available from Intel Corporation. (Pentium(copyright) Pro is a registered trademark of Intel Corporation.)
Increasing the linear address space of a microprocessor provides larger user and system space and reduces the burden associated with linear address exhaustion for a larger physical address space. Increasing the word size of a microprocessor, e.g., from 32 bits to 64 bits, to provide a larger linear address space is a major engineering design task. It may therefore be of economic utility to increase the linear address space of an existing microprocessor design without increasing its word size. Furthermore, it may be advantageous for a microprocessor with increased linear address space to be backward compatible with code designed for the original sized linear address space and supported paging structures.