The development of computer systems and the corresponding increase in the complexity of complex and relatively large software applications has placed increasing demands on the performance of these computer systems. As a result, many techniques have been implemented in an effort to increase computer system performance.
In order to meet the increasing demands placed on computer systems, the amount of addressable memory available on a computer system has been significantly increased. This increase enables a computer to handle more complex software programs, and to handle more information. Concurrently, the operating speed of the computer increases which enables larger programs to run relatively efficiently.
One particular technique for increasing the addressable memory of a computer system is to provide a virtual memory system. Large amounts of memory can be addressed with a virtual memory system by sharing a smaller amount of physical memory among many processes by dividing physical memory into blocks and allocating the blocks to different processes. A CPU produces virtual addresses that are translated via hardware and software to physical addresses, which can be used to access main memory via a process of memory mapping. A virtual memory systems enables the addressing of large amounts of memory as if they were part of the computer system main memory, even where the actual physical main memory provides substantially less storage space than is addressable.
Virtual memory systems use a virtual memory addressing system with a memory management unit (MMU) to translate virtual memory addresses into physical memory addresses where actual information is located.
Memory management units include address translation circuitry. The address translation circuitry translates a virtual address into a physical address. The resulting physical address is then used to access the originally requested memory location. Pursuant to some implementations, the memory management unit references in main memory two levels of descriptor tables for translating the virtual address into a physical address; namely, a Level 1 descriptor table and multiple Level 2 descriptor tables. An internal register, or Translation Base Register, contains the physical starting address of the Level 1 descriptor table.
Each Level 1 descriptor table entry points to a Level 2 descriptor table. A memory management unit (MMU) uses information from the Level 1 descriptor to retrieve the Level 2 descriptor. The Level 2 descriptor contains the physical address information required to translate the virtual address to a physical address. With this descriptor structure, every virtual memory access to main memory must first be preceded by two descriptor retrievals before the physical address can be derived and the main memory access can continue.
Descriptor tables can be configured in layers, or levels, and a significant amount of system clock time can be involved in trying to retrieve physical page addresses via the descriptor tables stored in a main memory. These physical page addresses are then used by a processor to access specific desired information. However, a significant amount of clock cycles are required to perform such a search which imparts significant and undesirable delay.
Therefore, cache-like memories in the form of translation lookaside buffers (TLBs) are often provided in memory management units in order to alleviate delays. A translation lookaside buffer (TLB) is a cache that is used to keep track of recently used address mappings so that time-consuming accesses to descriptor tables in main memory can be avoided. Accordingly, the TLB only holds descriptor table mappings, with a tag entry in the TLB holding a virtual page number, and each data entry in the TLB holding a physical page number. Typically, most recently used addresses are the most likely to be used. One algorithm implementation replaces TLB entries that are Least Recently Used (LRU), and another algorithm implementation keeps TLB entries that are Most Recently Used (MRU).
In operation, when a processor provides a virtual address whose page is presently stored in the TLB, the TLB quickly provides a physical page address for the information, which eliminates the need for the memory management unit (MMU) to spend several clock cycles accessing the descriptor tables in main memory. This occurrence is often referred to as a “TLB hit”. However, when a virtual page address is sent to the TLB, but is not found in the TLB, the memory management unit (MMU) has to access the descriptor tables in main memory which requires many more clock cycles. This is referred to as a “TLB miss”. The process by which the memory management unit (MMU) accesses descriptor tables in main memory for the purpose of updating the TLB is referred to as a “TLB fetch”.
ARM processors, or central processing units, and micro-controllers, available from Advanced RISC Machines (ARM), exist for use with a variety of handheld computing and communications products. The subsystem surrounding the processor includes a unified cache, a memory management unit (MMU), and a write buffer. In such products, the ARM processor is required to make requests to memory. More particularly, these requests take the form of checking with the memory management unit (MMU), both with virtual addresses and physical addresses. The memory management unit (MMU) is operative to support virtual memory. The unified cache is operative to store instructions and data, which enables the CPU to continuously execute code and process data without accessing main memory until a cache miss is encountered. The cache thereby contributes to improved performance and reduces memory bandwidth requirements.
Even though the use of a TLB may increase the speed of virtual-to-physical address translation, a TLB miss still causes the memory management unit (MMU) to access the descriptor tables in main memory. These descriptor table lookups detrimentally affect system performance by reducing the central processing unit's instruction and data throughput.
Therefore, there exists a need for further improvements to techniques for fetching TLB entries, or reducing the occurrence of TLB fetching while a central processing unit (CPU) is waiting for code and/or data.