A cache memory is a high speed memory unit interposed in the memory hierarchy of a computer system between a slower system memory and a processor to improve effective memory transfer rates and accordingly improve overall system performance. The cache memory unit is essentially hidden and appears transparent to the user, who is aware of only a larger system memory. The cache is usually implemented in static random access memory (SRAM) which has speeds comparable to that of the processor, while the system memory is implemented in less expensive, slower random access memory (RAM).
The cache concept anticipates the likely re-use by the microprocessor of selected data in system memory by storing a copy of the selected data in the cache memory. As such, not all memory regions are desired to be cacheable. For example, memory that is not likely to be re-used need not be cached. Accordingly, computer systems typically provide a region register on board the processor that contains system level attributes assigned to addresses within the region. For example, the cacheability of the region may be defined. The region register can also include information such as whether a given page is write-through or protected, etc.
For processors having an on-chip region register, processing the cacheability information typically proceeds as shown in FIG. 1. When an instruction requests the contents of a memory location, the instruction refers to the location not by an actual hardware or physical memory address, but by a "virtual" or "logical" address. The logical address is merely a name for a memory location which must then be translated into the appropriate physical memory location. The segmentation unit in the processor's memory management unit (MMU) translates the logical address into a linear address. If paging is not enabled, the linear address then becomes the physical address that is actually output from the processor to access the requested memory location, as shown. If paging is enabled, the paging mechanism further translates the linear address into a physical address which is then used to access the requested memory location. Prior to being used for accessing the memory, however, the physical address is compared to the cacheability ranges stored in the region register. This is a relatively slow serial process which degrades system performance. Accordingly, there is a need for a system and method for improving the cacheability determination so as to enhance system performance.
A more detailed discussion of the address translation process is deemed appropriate. Referring now to FIG. 2, in protected mode, each block or segment of memory is described by a structure called a segment descriptor. Segment descriptors reside in a set of system tables called descriptor tables. The segment and offset registers hold values referred to as a selector and offset, respectively, which are used to access one or more addresses in a desired memory segment. In essence, the selector is a 16 bit value that serves as the virtual name for a memory segment, and the MMU uses the selector to index in the descriptor tables to the respective segment descriptor corresponding to the desired memory segment.
As shown in FIG. 3, a descriptor is a small block of memory that describes the characteristics of a much larger memory block or memory segment. The descriptor includes information regarding the segment's base address, its length or limit, its type, its privilege level and various status information. The segment's base address is the starting point in the segment's linear address space. As shown in FIG. 2, the logical address is added to the base address to generate the linear address of the desired memory segment. The limit field is a 20 bit field that determines the last addressable unit of the memory segment. The segment type field is a 3 bit field which indicates the type of segment being defined, for example, a code, data, or stack segment. The privilege level field is a two bit field which indicates the level of privilege associated with the memory segment defined by the descriptor. Among the status bits, a bit referred as the Accessed bit is automatically set by the CPU whenever a memory reference is made to the segment defined by the respective descriptor.
The Intel X86 family of processors also include segment descriptor cache registers for each of its segment registers. Whenever a segment register's contents are changed, the 8-byte descriptor associated with that selector is automatically loaded (cached) on the chip. This is referred to as a segment descriptor reload. Once loaded, all references to that segment use the cached descriptor information instead of reaccessing the descriptor from main memory.
Referring again to FIG. 2, once the segmentation unit has translated the logical address into a linear address, if paging is enabled, the linear address is provided to the paging mechanism to be translated into a physical address. Referring now to FIG. 4, the CPU uses a directory and a page table to translate the linear address (from the segmentation unit) into a physical address. The CPU also includes an internal register referred to as control register 3 (CR3) which also contains the physical starting address of the page directory. The lower 12 bits of CR3 are always zero to ensure that the page directory is always page aligned. As shown in FIG. 4, the linear address produced by the segmentation unit includes a directory field which stores an index to the page directory. As shown, the directory value in the linear address is combined with the page directory base address in CR3 to index to the desired entry in the page directory.
The page directory is four Kbytes long and allows up to 1024 page directory entries. The contents of a Page Directory Entry are shown in FIG. 5. Each Page Directory Entry contains the base address of a respective page table as well as information about the respective page table. As shown in FIG. 4, the page table base address stored in the respective Page Directory Entry is combined with a page table index value stored in bits 12-21 of the linear address. The page table index value is used to select one of the 1024 page table entries.
Each page table is four Kbytes and holds up to 1024 page table entries. As shown in FIG. 6, a Page Table Entry contains the starting or base address of the page frame being accessed as well as statistical information about the page. As shown in FIG. 4, the frame base address in the Page Table Entry is concatenated with the lower 12 bits of the linear address, referred to as the offset, to form the physical address. The physical address is output from the pins of the CPU to access the desired memory location.
Referring again to FIGS. 5 and 6, the lower 12 bits of each Page Table Entry and Page Directory Entry contain statistical information about pages and page tables respectively. The P or Present bit, bit 0, indicates if a Page Directory or Page Table Entry can be used in address translation. The A or Accessed bit, bit 5, is set by the processor for both types of entries before a read or write access occurs to an address covered by an entry. For a Page Table Entry, the D or Dirty bit, bit 6, is set to 1 before a write to an address covered by that Page Table Entry occurs. The D bit is undefined for Page Directory Entries. When the P, A and D bits are updated by the microprocessor, the processor generates a Read-Modify-Write cycle which locks the bus and prevents conflicts with other processors or peripherals. The 3 bits marked "OS Reserved" in FIGS. 5 and 6 (bits 9-11) are software definable. Users are free to use these bits for any desired purpose they wish. An example use of the OS Reserved bits would be to store information about page aging. By keeping track of how long a page has been in memory since being accessed, an operating system can implement a page replacement algorithm like Least Recently Used. The (User/Supervisor) U/S bit 2 and the (Read/Write) R/W bit 1 are used to provide protection attributes for individual pages.
The paging mechanism described above is designed to support demand paged virtual memory systems. However, performance would degrade substantially if the processor was required to access two levels of tables for every memory access. To solve this problem and increase performance, the MMU paging mechanism uses an internal cache memory called the Translation Lookaside Buffer (TLB) which stores the most recently accessed translation entries. For example, the TLB may be a four-way set associative cache, meaning that the cache includes four banks of memory where a particular translation entry can be stored. The TLB may also include a least recently used (LRU) replacement algorithm for adding new translation entries if the TLB is currently full. The least recently used entry is replaced by a new entry because statistically the LRU entry is the least likely to be requested in the future. Therefore, the TLB automatically keeps the most commonly used translation entries stored in the processor. It is noted that the translation entries stored in the TLB are not necessarily the same as the Page Table Entries stored in memory. More particularly, the translation entries stored in the TLB can include the same, or more or less information than that stored in memory. It need not include, for example, statistical information, but must include the address entry. Generally, the translation entry includes enough information to generate a physical address given the linear address, as well as to implement any protections included in the Page Table Entries and Page Directory Entries. For example, the cacheability of the page as determined by the Page Table Entries (i.e. the PCD bit) may be stored in the TLB.
When the MMU requests a translation of a particular linear address and the corresponding translation entry resides in the TLB, then a TLB hit occurs and the entry is retrieved from the TLB without requiring a bus cycle or table lookups. However, if the requested translation does not reside in the TLB, then the requested entry is retrieved from the page tables in system memory and placed in the TLB.
Referring now to FIG. 7, the paging mechanism operates in the following fashion. When the paging mechanism receives a linear address from the segmentation unit, the upper 20 bits of the linear address are compared with the entries in the TLB to determine if there is a match. If there is a match (referred to as a TLB hit), then the 32-bit physical address is calculated using the page frame base address stored in the translation entry and the offset from the linear address as described above. The physical address is then compared to the data in the region register to determine whether the addressed data are cacheable.
If the requested translation is not in the TLB, then the CPU reads the appropriate Page Directory Entry from memory. If the present bit in the Page Directory Entry indicates that the page table is in memory, then the CPU calculates the Page Table Entry address, reads the appropriate Page Table Entry, and sets the Accessed bit. If the present bit in the Page Table Entry indicates that the requested page frame is in main memory, then the processor updates the Accessed and Dirty bits as needed and performs the memory access. The upper 20 bits of the linear address are stored in the TLB for future accesses. If the present bit for either the Page Directory Entry or the Page Table Entry indicates that these entries are not in memory, then the processor generates a page fault which potentially means that the requested page frame must be swapped in from disk.