1. Field of the Invention
This invention relates to processors and computer systems, and more particularly to address translation memory systems used within computer systems and processors.
2. Description of the Related Art
A typical computer system includes a processor which reads and executes instructions of software programs stored within a memory system. In order to maximize the performance of the processor, the memory system must supply the instructions to the processor such that the processor never waits for needed instructions. There are many different types of memory from which the memory system may be formed, and the cost associated with each type of memory is typically directly proportional to the speed of the memory. Most modern computer systems employ multiple types of memory. Smaller amounts of faster (and more expensive) memory are positioned closer to the processor, and larger amounts of slower (and less expensive) memory are positioned farther from the processor. By keeping the smaller amounts of faster memory filled with instructions (and data) needed by the processor, the speed of the memory system approaches that of the faster memory, while the cost of the memory system approaches that of the less expensive memory.
Most modern computer systems also employ a memory management technique called "virtual" memory which allocates memory to software programs upon request. This automatic memory allocation effectively hides the memory hierarchy described above, making the many different types of memory within a typical memory system (e.g., random access memory, magnetic hard disk storage, etc.) appear as one large memory. Virtual memory also provides for isolation between different programs by allocating different physical memory locations to different programs running concurrently.
A typical modern processor includes a cache memory unit coupled between an execution unit and a bus interface unit. The execution unit executes software instructions. The cache memory unit includes a relatively small amount of memory which can be accessed very quickly. The cache memory unit is used to store instructions and data (i.e. information) recently used by the execution unit, along with information which has a high probability of being needed by the execution unit in the near future. Searched first, the cache memory unit makes needed information readily available to the execution unit. When needed information is not found in the cache memory unit, the bus interface unit is used to fetch the needed information from a main memory unit external to the processor. The overall performance of the processor is improved when needed information is often found within the cache memory unit, eliminating the need for time-consuming accesses to the main memory unit.
Modern processors (e.g., x86 processors) support a form of virtual memory called "paging". Paging divides a physical address space, defined by the number of address signals generated by the processor, into fixed-sized blocks of contiguous memory called "pages". If paging is enabled, a "virtual" address is translated or "mapped" to a physical address. For example, in an x86 processor with paging enabled, a paging unit within the processor translates a "linear" address produced by a segmentation unit to a physical address. If an accessed page is not located within the main memory unit, paging support constructs (e.g., operating system software) load the accessed page from secondary memory (e.g., magnetic disk) into main memory. In x86 processors, two different tables stored within the main memory unit, namely a page directory and a page table, are used to store information needed by the paging unit to perform the linear-to-physical address translations.
Accesses to the main memory unit require relatively large amounts of time. In order to reduce the number of required main memory unit accesses to retrieve information from the page directory and page table, a small cache memory system called a translation lookaside buffer (TLB) is typically used to store the most recently used address translations. As the amount of time required to access an address translation in the TLB is relatively small, overall processor performance is increased as needed address translations are often found in the readily accessible TLB.
In general, processor performance increases with the number of address translations (i.e., entries) in the TLB. When an entry corresponding to an input linear address is found within the TLB, the TLB asserts a "HIT" signal. As the number of entries in the TLB increases, the time required to generate the HIT signal also increases. Any increase in the time required to generate the HIT signal may increase the amount of time which must be allocated to address translation. Address translation may be on a critical timing path within the processor, thus increasing the number of TLB entries beyond a certain number may result in a reduction in processor performance.
It would thus be desirable to have a TLB including fast logic circuitry for generating the HIT signal. Such fast HIT signal generation circuitry would allow the TLB to have a relatively large number of entries without requiring additional address translation time, resulting in an increase in processor performance.