The present invention relates generally to computer systems having virtual memory addressing, and in particular the present invention relates to such computer systems have a translation lookaside buffer (TLB) or similar cache for use with virtual memory addressing.
Virtual memory addressing is a common strategy used to permit computer systems to have more addressable memory than the actual physical memory installed within a given computer system. Data is stored on a storage device such as a hard disk drive and is loaded into physical memory as needed typically on a memory page-by-memory page basis, where a memory page is a predetermined amount of contiguous memory. Computer systems having virtual memory addressing must translate a given virtual memory address to a physical memory address that temporarily corresponds to the virtual address.
In many such computer systems, translation is accomplished via a translation lookaside buffer (TLB), also known by those skilled in the art as a TC (translation cache). The TLB is a cache located preferably near the processor of the computer system in order to improve the access speed and also holds virtual page-to-physical page mappings most recently used by the processor. The TLB entries may be cached entries from a page table or translations created and/or inserted by the operating system. The translation of virtual to physical addresses commonly are a critical path in computer performance. Conventional TLB organizations well-known to those skilled in the art include direct-mapping in which an entry can appear in the TLB in only one position, fully associative mapping in which an entry can be placed anywhere in the TLB, and set-associative in which an entry can be placed in a restricted set of places in the TLB where a set is a group of entries in the cache and an entry can be placed anywhere within the set.
Fully associative TLBs conventionally include a Content Addressable Memory (CAM) array and a Random Access Memory (RAM) array. CAM, also known as xe2x80x9cassociative memoryxe2x80x9d is a kind of storage device which includes comparison logic with each bit of storage. A data value is broadcast to all words of storage and compared with the values there. Words which match are flagged in some way. Subsequent operations can then work on flagged words and/or data linked to those flagged words, e.g. read them out one at a time or write to certain bit positions in all of them.
Set-associative TLBs conventionally include decoders, RAM arrays, and comparators. Part of the virtual address is used by the decoder to determine which entries in the RAM array may contain a corresponding physical address translation. The remainder of the virtual address is typically used along with a tag stored in the RAM array (each RAM array entry has a corresponding tag) by the comparator to determine a specific entry to be used for translation. Set-associative TLBs tend to be faster to access than fully associative TLBs due to the use of decoders rather than CAM arrays.
Conventional TLBs are designed to work with a fixed page size, such as a 4K (1K=1024 bytes) page size, a 16K page size, or a 256K page size. This is less than optimal because memory space on conventional personal computers (PCS) is designed in a manner wherein different address ranges have differing page granularity requirements. For example, on a PC, physical memory space between addresses 640K and 1M (1M=2{circumflex over ( )}20 bytes) need 4K-8K granularity to support partitions for read-only memories (ROMs), hard disk interfaces, graphics interfaces, etc., but physical memory space below 640K and above 1M is random-access memory (RAM), which would be more efficiently mapped with larger page sizes.
A conventional solution is to use multiple TLBs in which at least one TLB is implemented for each page size of addressable memory space. For example, one TLB is implemented for memory space that is addressed via 4K page sizes and another TLB is implemented for memory space that is addressed via 16K page sizes. This is problematic because all TLBs must be referenced for each virtual address (slower than referencing a single TLB), the method allows creation of multiple (overlapping) entries representing the same virtual address, and the Operating System (OS) is limited to a small set of possible page sizes.
Another conventional solution is to implement one TLB using a page size of the smallest page size needed, such as 4K in the above example of a conventional microprocessor. However, this is problematic in that many more entries in the TLB will be needed to describe the portions of memory that are addressed in larger page sizes. For example, eight entries would be needed in a TLB to describe every 32K page of memory if the TLB uses a page size of 4K. If the number of entries in the TLB is increased to accommodate the requirement of more entries, this results in slower performance because searching a larger TLB is slower than searching a smaller TLB. If the number of entries in the TLB is not increased, then the number of xe2x80x9cmissesxe2x80x9d will increase (the case in which a given virtual address has no corresponding entry in the TLB), thus causing hardware or the OS to spend a significant number of cycles retrieving the missing translation before program execution can resume. Because the translation of virtual to physical addresses are a bottle-neck in the speed of computers, it is critical that the translation be accomplished quickly.
Therefore, a need exists for a single fast TLB that can accommodate multiple page sizes quickly.
The system identifies virtual addresses as including three portions; a virtual fixed page address in the upper bits of the address word that is always used for identification of the page; an offset address in the lower bits of the address word that is always used for identification of the page offset; and a variable page address between the virtual fixed page address and the offset, that identifies either page address or offset address, depending on the size of the page corresponding to the virtual address word.
In one embodiment of a method of the present invention, the system receives a virtual address and page size bias for the virtual address and outputs a corresponding physical address. The page size bias is used in the look-up of the physical address. During intermediate stages of the virtual to physical address translation, according to the look-up of the virtual address and page size bias, a page size mask and physical page address are generated. The page size mask indicates what portion of the virtual address describes the address of the virtual page in memory space, and what portion of the address represents an offset within the virtual page. Since the physical page size and virtual page size are the same, the page size mask similarly indicates what portion of the physical page address generated describes the translated virtual page address and is to be used as physical address output and what portion of the physical page address should be masked (because it is not part of the page address) and replaced with the virtual address offset within the page. The final physical address consists of the unmasked portion of the physical page address concatenated with the virtual address offset within the page (the offset within the page is not translated).
In one embodiment of an apparatus, the present invention generates a set of entry selects according to a virtual address and page size bias supplied, generates a physical page address from an entry selected by the entry selects in a first array, generates a virtual address tag from an entry selected by the entry selects in a first array, generates a page size mask from an entry selected by the entry selects in a first array, and generates a match signal from a comparison of the variable page address supplied with a corresponding entry selected by the entry selects in a second array (the match signal is also qualified with a valid bit contained within the second array which indicates whether or not the translation buffer entry selected is valid). A masked physical page address is created by masking-off the lower bits of the generated physical page address with the page size mask so that the address bits which correspond to the portion of the address which represents the offset within the page (as opposed to the portion of the address which represents the address of the page within memory space) are masked off. Then the offset address within the page is created by masking the virtual address with the inverse of the page size mask so that the address bits which correspond to the portion of the address which represents the address of the page within memory space (as opposed to the portion of the address which represents the offset within the page) are masked off. The physical address is then formed by combining the masked physical page address with the offset address within the page.
In another embodiment of an apparatus, a computer system that includes one or more processors, one or more physical memories operating within the processor(s) in which the memories have more than one page size identified to describe the corresponding physical memory, and a translation buffer coupled to the physical memory through an address bus in which the translation buffer receives a virtual address and a page size bias and outputs a physical memory address. The translation buffer includes a decoder that receives the page size bias and a subset of the virtual address input and outputs a set of entry selects. It also includes an array that receives the entry selects from the decoder which contains entries corresponding to those entry selects describing a virtual fixed address tag, a page size mask, a physical memory page address, in which the array outputs the physical address corresponding to the virtual address supplied by combining complementary portions of the physical page address and the virtual page offset address. The array also outputs a virtual fixed address tag which is compared to the virtual fixed address portion of the virtual address supplied to generate a partial match signal. Finally, the translation buffer includes a second array, which contains a variable virtual address tag and a page size mask. The second array inputs the variable page address portion of the virtual address supplied and the entry selects. It then uses the entry selects to select an entry and masks the variable page address supplied with the page size mask of the entry selected such that the portion of the variable page address which corresponds to the offset address within the page is masked and compares this result for equality with the variable virtual address tag of the entry selected, similarly masked with the page size mask of the entry selected, to generate match signal (the match signal is also qualified with a valid bit contained within the second array which indicates whether or not the translation buffer entry selected is valid). A translation match is indicated when both the partial match signal from the first array and the match signal from the second array are true. The translation can be performed in parallel by one or more translation buffers to form a set-associative TLB in which each of the translation buffers is one way of the TLB.