The present invention pertains generally to cache memory access in digital computers, and more particularly to a method and circuit for improving cache access time in systems which support variable page sizes.
Modern digital computers typically employ a multilevel hierarchy of memory systems, with relatively fast, expensive, limited-capacity memory at the highest level of the hierarchy and proceeding to relatively slower, lower cost, higher-capacity memory at the lowest level of the hierarchy. Typically, the hierarchy includes a small fast memory called a cache, either physically integrated within a processor integrated circuit or mounted physically close to the processor for speed.
The processor includes a memory management system. The purpose of the memory management system is to dynamically select which data is to be stored in each level of the memory hierarchy and to control the transfer of data therebetween. Data is accessed from the highest level memory cache. The memory management system therefore attempts to ensure, if possible, that the data is present in the highest level cache at the time the processor requests it. Typically this is accomplished by maintaining the most recently accessed data in the highest level cache. If a processor requests an item from a cache and the item is present in the cache, the event is called a cache hit, and the requesting address (referred to herein as the xe2x80x9cvirtualxe2x80x9d address) is translated to a corresponding xe2x80x9cphysicalxe2x80x9d address in the TLB RAM. If a processor requests an item from a cache and the item is not present in the cache, the event is called a cache miss, which causes an exception to be generated known as a xe2x80x9cpage faultxe2x80x9d. When the operating system services the page fault, typically some form of xe2x80x9cleast recently usedxe2x80x9d (LRU) technique is used to expunge that page from RAM, load the required page from a lower level of the memory hierarchy, and restart the address request. In many processor designs, the time required to access an item for a cache hit is one of the primary limiters for the clock rate of the processor. Therefore, optimization of access time for cache hits is critical for performance. There is an ongoing need for improvement in cache access time for computer processors.
Typically, a memory is organized into words (for example, 32 bits or 64 bits per word). Typically, the minimum amount of memory that can be transferred between a cache and a next lower level of the memory hierarchy is called a line, or sometimes a block. A line is typically multiple words (for example, 16 words per line). Memory may also be divided into pages (also called segments), with many lines per page. In some systems, page size may be variable.
Virtual memory (paging or segmentation) requires a data structure, sometimes called a page table, that translates the virtual address to the physical address. To reduce address translation time, computers commonly use a specialized associative cache dedicated to address translation, commonly called a Translation Look-aside Buffer (TLB).
The TLB stores a table of virtual addresses and corresponding physical addresses of data pages contained in the cache memory. When data is to be accessed, a search using the virtual address of the data is performed on the TLB to determine whether the page containing the data is in the faster cache memory. If not, the data page must be fetched from a lower-level memory and loaded into the cache.
An important memory performance issue is the page size. In this regard, since a program typically does not take up an integral number of pages, using a large fixed page size typically wastes memory. This concept is known as page fragmentation. By way of example, if the page size is four Gigabytes and the program needs only 26 Kilobytes, then only one page will be necessary for the program. In a processor that does not allow process sharing of pages, nearly all of the 4 Gigabyte page will be wasted in fragmentation.
On the other hand, if the page size is made too small, the directory and page tables must be larger in order to cover the same amount of memory space. By way of example, in a system employing four Kilobyte pages, a one Megabyte memory space would have to be divided into two hundred fifty pagesxe2x80x94almost a quarter of an entire one Kilobyte page table. Moreover, for the same amount of address space with smaller pages, the transfer time to fill the RAM from disk becomes longer because of the associated overhead with individual page reads and writes and because it is possible that the pages are stored in non-contiguous disk space, requiring multiple accesses to the disk.
Accordingly, some computer architectures support the use of variable page sizes. The support of variable page sizes by a TLB solves many of the problems mentioned above that result from fixed page size TLBs. However, prior cache systems that support variable page sizing use a page masking technique that can result in slower cache performance. In particular, prior implementations of variable page masking methodologies store a page mask in along with the virtual address in the TLB. When a cache hit occurs, the page mask is read out of the virtual address stored in the matching entry of the TLB. The page mask signals from the matching virtual address entry of the TLB are then used to control respective virtual address bypass multiplexers, which multiplex the virtual address bits and corresponding bits in the physical address from the physical page number RAM in the matching TLB entry. If a bit is masked, the virtual address bypass multiplexer associated with that bit selects the corresponding virtual page number bit from the virtual address, which is then sent out as part of the physical address. This process is necessary since the bits that were masked out in the virtual address essentially generated a larger page size, and therefore the bits that were masked during the virtual address lookup must be used as part of the offset into the page.
The virtual address to physical address translation is must be performed as quickly as possible in cache memory. The additional multiplexing operation for virtual address bypassing required in systems that support variable page sizes limits the translation speed. Accordingly, there is a need for improvement in the virtual address bypass architecture.
The present invention is a novel method and apparatus for performing virtual address bypassing using a page mask stored, and locally multiplexed, within the physical address RAM of the TLB. The local virtual address bypassing multiplexing operation of the invention allows the multiplexing function to be performed prior to the occurrence of a TLB hit, thereby eliminating any post-match multiplexing delay. This technique significantly reduces the virtual address to physical address translation time.
In accordance with the invention, the page mask is stored along with the physical address in the physical page number RAM. Each page mask bit controls a corresponding local multiplexer which receives the virtual address bit(s) corresponding to the page masks bit. If the virtual address bit was masked, the local multiplexer outputs the corresponding virtual address bit in place of the corresponding physical address bit. If the virtual address bit was not masked, the local multiplexer selects the corresponding physical address bit data.