Data in computer systems is stored in memory (the term "data" is used herein to refer to both data and instructions). Memory may comprise a variety of memory devices, such as read-only memory (ROM), random access memory (RAM) and disk storage. Computer system designers and users continue to require increased memory capacity. Various methods have been developed in attempts to maximize the utilization of existing memory and to make memory access faster. One of these methods includes dividing memory into a hierarchical structure including levels with faster and slower access times. For example, faster, cache memory is used to store data used by the CPU. When the CPU accesses data using an address, the cache is first checked for the data using the address. If the data is not in the cache, the data is fetched from physical memory and stored in the cache for later use.
Another method of maximizing memory utilization includes the creation and use of virtual memory. Virtual memory was developed to allow programmers to use more physical memory than actually existed by automatically managing levels of memory hierarchy. In virtual memory systems, several programs may simultaneously operate as if each had sole access to physical memory, even though the combined memory requirements of the programs exceed available physical memory.
In typical virtual memory systems, each of several programs operating with a single CPU "thinks" it has access to the entire address range of physical memory. Because that is not really the case, any address a program addresses must be relative to local memory. These relative addresses are called virtual addresses.
In a typical virtual memory system, the processor, for example a central processing unit (CPU), generates a virtual address. In most virtual memory systems, some address translation is needed to convert the virtual address to a physical address. This translation takes CPU time. To decrease the time spent performing address translations, prior computer systems using virtual addressing employ a dedicated cache called a translation lookaside buffer (TLB) to store translated addresses. The TLB is searched for the required address before translation is initiated. If the address is found in the TLB, the translation process is skipped, thereby saving CPU time.
Typical TLBs include a content addressable memory (CAM) storing virtual address tags and a RAM storing corresponding physical addresses. As many as 128 or more virtual address tag-physical address pairs are typically stored in a TLB. A TLB search involves comparing a supplied virtual address tag to the virtual address tags in the CAM. It is possible for only one match or no matches in the CAM to occur. If the supplied virtual address tag is found in the CAM, a match occurs. On a match, a CAM match line associated with the location in which the virtual address is found indicates a hit. A hit in the CAM causes a corresponding RAM location, or word line, to output the physical address corresponding to the virtual address tag.
In view of design considerations, CAM and RAM are usually precharged to opposite states. A CAM hit is signified by a high logic level on the appropriate match line. All CAM match lines except the match line actually matched are pulled low. In the RAM, conversely, all word lines are precharged low and a hit in the CAM results in the high CAM match line causing only one word line to transition high in the RAM, that is, the word line containing the proper physical address.
The CAM and the RAM in the TLB operate with a common clock. On the high half-cycle of the clock CAM is precharging while RAM is evaluating. Similarly, on the low half-cycle of the clock, CAM is evaluating while RAM is precharging. For optimal system speed a TLB access should be completed in a single clock cycle. A TLB access can be completed in a single clock cycle when the CAM evaluating a possible match on a low half-cycle and the RAM outputting an address on the following high half-cycle. In prior TLBs, however, it is not possible to guarantee that a TLB access will be complete in one clock cycle.
In typical prior art systems using TLBs, a virtual address tag is resolved in a unit referred to as a table walker before it is passed to the CAM. The virtual address is passed to the CAM upon a clock edge of the table walker clock. Transfer of information, from results of CAM comparison to eventual output of an address from the RAM, is controlled by multiple clocks which must each meet timing requirements. Each clock event must be delayed if information to be passed is not properly set up. For example, usually the CAM is too slow to complete evaluation in one half-cycle of the clock because evaluation time is dictated by transistor response time. Therefore, clock cycles are wasted.
A possible solution is to increase the CAM transistor sizing to make the CAM faster, but the cost in integrated circuit area would be too great. Another possibility is to use a timer to control the interaction of CAM and RAM. As clock rates increase, however, allowances for process, temperature and power variations take up larger and larger percentages of the clock cycle, making it more difficult to guarantee that a timer will reach both ends of the TLB at about the same time.
What is needed then is a TLB providing faster access. As will be seen, the present invention provides a novel TLB circuit which permits TLB access to be completed in one clock cycle and reduces dependence on multiple clocks. The invention is characterized by the use of NMOS transistors and occupies a minimum of integrated circuit area.