The present invention relates generally to the field of processors and in particular to a system and method of locking entries in one or more Translation Lookaside Buffers against replacement.
Microprocessors perform computational tasks in a wide variety of applications, including portable electronic devices. In most cases, maximizing processor performance is a major design goal, to permit additional functions and features to be implemented in portable electronic devices and other applications. Also, in many applications, some computational tasks have priority over others, and it would be advantageous for the system to guarantee that computational resources are reserved for high-priority tasks.
Many programs are written as if the computer executing them had a very large (ideally, unlimited) amount of fast memory. Most modern processors simulate that ideal condition by employing a hierarchy of memory types, each having different speed and cost characteristics. The memory types in the hierarchy vary from very fast and very expensive at the top, to progressively slower but more economical storage types in lower levels. A typical processor memory hierarchy may comprise registers (gates) in the processor at the top level; backed by one or more on-chip caches (SRAM); possibly an off-chip cache (SRAM); main memory (DRAM); disk storage (magnetic media with electromechanical access); and tape or CD (magnetic or optical media) at the lowest level. Most portable electronic devices have limited, if any, disk storage, and hence main memory, often limited in size, is the lowest level in the memory hierarchy.
In a computer memory hierarchy, each lower level maintains a full (but possibly stale) copy of the data resident in higher layers. That is, the data stored in higher levels replicates that in the lower levels. Since smaller, higher level storage may map to multiple locations in the larger, lower level memory, a mapping scheme is required to translate addresses between hierarchy layers. Most processors operate in a very large, conceptually contiguous virtual address space. Main memory is accessed in a physical address space that is constrained by hardware and system parameters. Caches—high-speed memories interposed between the processor core and main memory—may be accessed completely by virtual addresses, completely by physical addresses, or in combination (such as by using a virtual index and a physical tag). Regardless of the cache configuration, however, addresses must be translated from virtual address space to physical address space.
The mapping and translation of many large virtual address spaces (one per running program or context) to one limited physical memory address space is known as memory management. Memory management by the operating system ensures proper performance by preventing programs from overwriting each other's data; provides security by disallowing one user from accessing another's data; and promotes reliability by disallowing user-level programs from accessing supervisor-level data structures, such as operating system allocation tables and parameters.
Memory may be managed in fixed-size segments called pages, which may for example comprise 4 K bytes. The upper, or most-significant, portion of an address, called the page number, identifies a particular memory page. The page number is translated from virtual to physical address space. The lower, or least-significant portion of the address, called a page offset, is an offset into the page that is the same for virtual and physical addresses; page offset bits are not translated. As an example, for a 32-bit address with 4 K pages, the page number would comprise address bits [31:12] and the page offset, bits [11:0]:
TABLE 1Page Fields of Address31page number1211page offset0
The mapping of virtual to physical page numbers is controlled by the operating system software, in one or more data structures called page tables. A page table may be a single table, or a hierarchical or tree-like series of tables, each mapping a portion or segment of the virtual page number to a corresponding range of physical memory. The page tables additionally store attributes of the physical pages, such as read, write and execute permissions, whether the page is shared or dedicated to a single process, and the like. Initially, the processor must “walk,” or traverse the page tables to translate a new virtual address to a corresponding physical address, to access main memory (or cache memory, if it is physically indexed or tagged). Subsequent address translations may be speeded by storing the virtual and physical page numbers, and the page attributes, in a TLB. A TLB may store address translations and page attributes for both data and instruction pages. Additionally, an instruction TLB (ITLB), which may comprises a subset of a unified TLB, may separately store address translations and page attributes for instructions.
A TLB may comprise a Content Addressable Memory (CAM) and associated Random Access Memory (RAM), each having a fixed number of entries, such as for example 32, 64, or 128. The CAM performs a parallel comparison of a virtual page number presented for translation, against all stored, previously translated virtual page numbers. The output of the CAM is the location of the stored virtual page number that matches the applied virtual page number. This location indexes the RAM, which provides the stored physical page number corresponding to the virtual page number, as well as the page attributes. The physical address applied to the cache and/or main memory is then the physical page number retrieved from the TLB, concatenated with the page offset from the virtual address.
When a new virtual page number is presented for translation, a TLB miss occurs, and the processor must traverse the page tables to perform a translation. When the page table walk is complete, the virtual and physical page numbers and page attributes are stored in an empty location in the TLB. If the TLB is full, an existing entry must be replaced with the new entry. A variety of replacement algorithms are known in the art, such as random, round-robin, not recently used, First In-First Out (FIFO), second-chance FIFO, least recently used, not frequently used, aging, and the like. For memory pages associated with critical tasks, many TLB implementations allow the operating system to lock one or more TLB entries against replacement, to ensure that the entries always reside in the TLB to perform fast translation for the critical tasks. Locked TLB entries do not participate in the TLB replacement algorithm when a TLB entry must be replaced. However, not all processor instruction sets include TLB management instructions, such as instructions to lock TLB entries against replacement. In these cases, the TLB is managed by hardware, and the operating system may lack any way to directly lock TLB entries.
For higher performance, a processor may include a smaller, faster TLB having, e.g., 4, 8, or 16 entries, called a Level-0 or L0 TLB (with the main TLB referred to as a Level-1 or L1 TLB). The L0 TLB is also known in the art as a micro TLB. The L0 TLB stores the few most recently used address translations, capitalizing on the temporal and spatial locality principle of most programs, that instructions or data from a memory page recently accessed are likely to be fetched again. To translate a virtual address, the processor first presents the virtual page number to the L0 TLB. If the virtual page number hits in the L0 TLB, a corresponding physical page number and page attributes are provided. If the virtual page number misses in the L0 TLB, the virtual page number is presented to the L1 TLB for translation.
Generally, the L0 TLB is a hardware implementation that is not recognized or directly controlled by software. That is, software cannot directly read and write L0 TLB entries; management of the L0 TLB is performed by hardware. One consequence of this is that the operating system cannot designate entries in the L0 TLB as locked against replacement. The ability to lock one or more L0 TLB entries against replacement would be advantageous, as it would ensure that the fastest translation is always available for critical tasks.