The technical field encompasses computer architectures having content addressable cache designs. In particular, an architecture that supports reading of the contents of the tag portion of a CAM structure without the need for a separate RAM read port.
Computer systems may employ a multi-level hierarchy of memory, with relatively fast, expensive but limited-capacity memory at the highest level of the hierarchy and proceeding to relatively slower, lower cost but higher-capacity memory at the lowest level of the hierarchy. The hierarchy may include a small fast memory called a cache, either physically integrated within a processor or mounted physically close to the processor for speed. The computer system may employ separate instruction caches and data caches. In addition, the computer system may use multiple levels of caches. The use of a cache is transparent to a computer program at the instruction level and can thus be added to a computer architecture without changing the instruction set or requiring modification to existing programs.
A cache hit occurs when a processor requests an item from a cache and the item is present in the cache. A cache miss occurs when a processor requests an item from a cache and the item is not present in the cache. In the event of a cache miss, the processor retrieves the requested item from a lower level of the memory hierarchy. Associated with cache design is a concept of virtual storage. Virtual storage systems permit a computer programmer to think of memory as one uniform single-level storage unit but actually provide a dynamic address-translation unit that automatically moves program blocks on pages between auxiliary storage and the high speed storage (cache) on demand.
Also associated with cache design is the concept of fully associative or content-addressable memory (CAM). Content-addressable memory is a random access memory that in addition to having a conventional wired-in addressing mechanism also has wired-in logic that makes possible a comparison of desired bit locations for a specified match for all entries simultaneously during one memory-cycle time. The specific address of a desired entry need not be known since a portion of its contents can be used to access the entry. All entries that match the specified bit locations are flagged and can be addressed the current or on subsequent memory cycles.
Memory may be organized into words (for example 32 bits or 64 bits per word). The minimum amount of memory that can be transferred between a cache and the next lower level of memory hierarchy is called a line or a block. A line may be multiple words (for example, 16 words per line). Memory may also be divided into pages or segments with many lines per page. In some computer systems page size may be variable.
In modem computer memory architectures, a central processing unit (CPU) produces virtual addresses that are translated by a combination of hardware and software to physical addresses. The physical addresses are used to access physical main memory. A group of virtual addresses may be dynamically assigned to each page. Virtual memory requires a data structure, sometimes called a page table, that translates the virtual address to the physical address. To reduce address translation time, computers may use a specialized associative cache dedicated to address location, called a translation lookaside buffer (TLB).
A cache may include many segments, or ways. If a cache stores an entire line address along with the data and any line can be placed anywhere in the cache, the cache is said to be fully associative. For a large cache in which any line can be placed anywhere, the hardware required to rapidly determine if and where an item is in the cache may be very large and expensive. For larger caches a faster, space saving alternative is to use a subset of an address (called an index) to designate a line position within the cache, and then store the remaining set of the more significant bits of each physical address, called a tag, along with the data. In a cache with indexing, an item with a particular address can be placed only within a set of lines designated by the index. If the cache is arranged so that the index for a given address maps exactly to one line in the subset, the cache is said to be direct mapped. If the index maps to more than one line in the subset, or way, the cache is said to be set-associative. All or part of an address may be hashed to provide a set index that partitions the address space into sets.
With direct mapping, when a line is requested, only one line in the cache has matching index bits. Therefore, the data can be retrieved immediately and driven onto a data bus before the computer system determines whether the rest of the address matches. The data may or may not be valid, but in the usual case where the data is valid, the data bits are available on the data bus before the computer system determines validity. With set associative caches, the computer system cannot know which line corresponds to an address until the full address is compared. That is, in set-associative caches, the result of tag comparison is used to select which line of data bits within a set of lines is presented to the processor.
Additional details related to prevalidated cache architectures are provided in U.S. patent application Ser. No. 08/955,821, filed on Oct. 22, 1997, now U.S. Pat. No. 6,014,732, entitled CACHE MEMORY WITH REDUCED ACCESS TIME, the disclosure of which is hereby incorporated by reference.
A fully associative cache is commonly defined as content-addressable memory (CAM) and is typically employed in computer systems having virtual memory. FIG. 1 illustrates the basic operation of a CAM 100. A tag value 101 is input into CAM and the associated data value 102 is outputted. While the preferred embodiment of the invention discloses a memory array associated with the CAM that temporarily stores instruction data that is fetched during the operation of the computer system, CAM""s are commonly used for storing many types of data that must be accessed very quickly.
For manufacturing and verification reasons it is often desirable to be able to obtain a listing of all tag values stored in the CAM. There is typically no provision for identifying tag values as such identification is not required in the normal usage of a CAM. Where identification is required, a RAM port is usually added to the memory array. This allows the tag value of each entry of the array to be individually addressed and dumped out for review. The cost of adding this second port is primarily area on the IC and results in an adverse impact on the speed of the array.
It would be desirable and of considerable advantage to provide a method and apparatus for obtaining the tag values of a CAM that does not require the addition of a RAM port to the memory array.
The present invention provides a method and apparatus for reading the tag portion of a CAM array without adding a second port to the CAM structure. Without a second port, the CAM structure takes less space on the IC and results in faster operation of the IC.
A CAM in accordance with the present invention provides for the identification of a plurality of multiple bit tag values stored in the CAM. Logic circuitry compares each bit of a test value to the corresponding bits of all stored tag values. A bit select is employed for generating a plurality of test bits for sequential input into the logic circuitry. The logic circuitry compares the plurality of test bits to the corresponding bit of each stored tag value and generates a xe2x80x9chitxe2x80x9d signal if the selected bit is the same as the corresponding bit of the stored tag value. Storage means are employed for recording the results of the compare with the M hit signals.
In a CAM containing M entries, each entry having an N bit wide tag value, the invention reads Nxc3x97M bits of data by generating M hit/miss signals that convey the result of the logic circuitry. The leftmost bit of the tag values for all M entries may be read simultaneously by performing a CAM operation that compares just 1 bit (out of N), the leftmost bit, of the tag values. The M compare results (from the M hit signals) are recorded in the storage means. By repeating this process for all N bits of the tag, all Nxc3x97M bits of tag data may be generated and recorded. This data constitutes the entire contents of the tag portion of the CAM array.