As the processing speed of microprocessors (CPUs) increases associated apparatus such as memories must keep pace in order to avoid being performance bottlenecks in the microprocessor system. The present invention is directed to improving the effective bandwidth of content addressable or associative memories by improving the ability to detect and access a specific piece of data in a content addressable memory (CAM). One important class of CAMs, known as a cache memory, is a fast memory in the hierarchical memory structure that is most closely associated with the CPU. However, the present invention is directed to all CAM structures in diverse applications including use in associative processors. A common characteristic of all CAMs is that data is retrieved from memory locations based on their content. The CAM searches in parallel over all or part of its entire storage using a target content description and comparing it with a corresponding field of bits in each memory location.
Because of the importance of cache memories, most of the following description is couched in terms of cache memories, however, it will be recognized by those practicing the underlying art that the inventive principles expounded are more broadly applicable.
High performance, modern microprocessors (CPUs) typically include an on-chip cache memory in order to minimize instruction latencies. Cache memories are effective because of the temporal locality phenomenon of typical instructions that tends to reuse portions of code so that access to memory tends to slowly drift with the progress of the instruction program. Consequently, portions of memory currently being addressed are likely to be accessed again in the near term so that if a fast local memory (cache) is provided, on-chip repeated access to slower, off-chip main memory can be reduced. As the temporal locality shifts, new data from main memory is acquired by the on-chip cache.
Critical to the efficient operation of the cache memory is the speedy determination of whether the current information required by the CPU is resident in cache: if present (a cache "hit") access can then be made to cache, and if not (a cache "miss") access must be made to the slower main memory.
Main memory consists of a large set of lines with consecutive addresses. Cache memory holds many lines (e.g. the Intel i486.TM. microprocessor cache memory is physically split into four 2-K blocks, each with 128 lines and 128 21-bit tags) but not necessarily in any particular order. When the CPU requires access to memory, the CPU first searches the cache to see if the desired data is present. Typically, the search is an associative process in which each line stored in cache has an address tag that indicates which lines of main memory are resident in cache. A comparison is made between the CPU provided tag and the tags associated with lines stored in cache. A match between the two indicates a "hit" while the lack of a match indicates a "miss", requiring an access to main memory. The data line acquired from main memory is stored in cache by replacing a previously stored line based on a prescribed replacement policy such as replacing the least recently used (LRU) line.
Because of the need to perform a tag comparison each time that the cache is accessed, the speed with which the comparison can be done is critical in determining the effective bandwidth of the cache memory.
The word TAG is used hereafter to mean a field of bits representative of the content addressable portion of a line of data stored in CAM.
A line of data is used to mean a standard unit of data storage which may include one or more bytes or words.