The present invention relates generally to information processing systems and more particularly to a methodology and implementation for a high speed content addressable memory and register mapper organization.
In current computer systems, register mappers are implemented in high-performance xe2x80x9cout-of-orderxe2x80x9d machines to manage a large set of physical registers within an associated register file. New registers are allocated during instruction dispatch for each instruction that writes a new result. The mapper maintains a register map to locate the physical registers that hold the latest or most current results for each logical register. A CAM (content-addressable memory) structure capable of simultaneously performing multi-searches is required to establish such a register map.
Using a conventional approach, a basic single-compare CAM cell such as the CAM cell illustrated in FIG. 1, can store, read and write a one-bit datum. The single-compare CAM cell can also compare a single incoming bit of data (DATA) against the stored content of the cell (STR) and indicate whether or not there is a match (MATCH). The CAM array consists of a fixed number of word-row""s. And each word-row (CAM word) has the same number of CAM bits (one CAM entry). The CAM array is supported by word-row and bit_column logic to update and access the CAM content. The match operation generates a match line (MATCH) if all the bits in the search pattern match all the bits in one CAM entry. The bit-wise compares (xe2x80x9cMatchxe2x80x9d in FIG. 1) in one CAM entry are AND""ed together to produce a match. The output match line is usually used to enable encoding and other readout circuits. Notice that only a single CAM search can be performed at a time with this circuit topology.
In register mapping applications, the number of word-rows is set equal to the number of physical registers available in the register pool. The CAM bit patterns in a word-row are the binary representation of the logical registers used in the instruction sets. The mapping implemented in the CAM array defines the associations of theological registers with the actual physical registers. This association can also be dynamically updated during instruction dispatch. The output match line is encoded to broadcast the matched physical register. Such circuits are placed outside and nearby the CAM array.
The CAM cell in FIG. 1 is capable to perform one single search (compare) at a time. For high performance processor, numerous (for example, more than eight) different searches (compares) are required to be made simultaneously against each CAM entry in a single clock cycle. A CAM structure with multi-compare CAM cells is required to accommodate such large number of searches. In this case multi-match lines are needed for each CAM entry. One match line for each search per CAM entry. All these match lines must be driven by the same CAM entry. To obtain this, one must integrate into the CAM cell structure as many bit-wise compare circuits as the CAM searches to be conducted in one clock cycle.
Theoretically, this can be accomplished by simply integrating the required number of compare circuits (similar to the one in FIG. 1) into the CAM cell topology shown in FIG. 1. Each compare has its own data/data_bar lines, but all compare circuits are connected to the same cell storage nodes (str/str_bar in FIG. 1). However, the overhead of running this many tens of bit-wise compares and match lines across each CAM entry would make the CAM cell and CAM entry area far too large to be used in practical chip design.
If the bit-wise compare circuits are added to form a vertical stack, this would increase the height of the CAM cell by more than one order of magnitude. It would be impossible to accommodate such CAM size in a chip design. On the other hand, the required compare circuits can be integrated into the CAM cell to form a horizontal stack of bit-wise compare lines. In this case many tens of compare lines must run across the CAM entry to produce the match lines corresponding to the various search vectors presented to the CAM array. The number of horizontal wires across the entry would limit the minimum size of the CAM cell that can be achieved with this approach. The wire loadings on the compare nodes will be excessive and the device sizes should be increased to compensate for that. The CAM cell storage nodes would see increased loads as well due to the increased number of compares. This degrades the speed of cell""s search and update. The cross-coupled inverters in FIG. 1 would also have to be made larger.
The overall size of the CAM entry is determined by the total size of match line generation circuits. For a given search vector, the corresponding bit-wise compare lines are extended across the CAM entry and combined (AND""ed) to obtain the output match lines. The match lines are then driven outside the CAM array to enable encoder and readout circuits. These circuits would present substantial wiring and device loads that require large match line drivers. This would ultimately increase the CAM entry area also.
All of the above indicate potential integration problems using conventional approaches because exceedingly large number of horizontal wires running across the CAM array would be required and the overall size of the CAM entry would be increased considerably and is clearly a poor utilization of the chip area.
Thus there is a need for an improved and practical methodology and implementation which provides an optimal approach for CAM and Mapper organization and circuit topologies.
A non-conventional methodology is provided for designing area efficient CAM (content addressable memory) circuit topology and for organizing a register mapper that uses the CAM array thereby allowing a large number of CAM searches to occur simultaneously. In an exemplary implementation, all compare circuits are placed outside of the CAM in separate match arrays where the actual comparisons occur. The CAM cell contains only latches to hold the CAM stored bit of data and a multi-port MUX to update the CAM content. The bit-wise compare functions of the CAM cells are physically separated from the CAM entry and placed in a xe2x80x9cmatch entryxe2x80x9d that is horizontally aligned with the CAM""s storage entry. The bits in the CAM storage entry are driven horizontally to all match arrays, and compared simultaneously against all search vectors presented to the various match arrays. The match lines encoder and read-out circuitry are both integrated locally within each match array which minimizes the total load on the match lines and reduces the overall width and area of the match array thereby substantially increasing the speed of the entire match generation and encoding. The structure of the CAM and search engine facilitates the implementation of the register mapper as a group of custom arrays, with each array being dedicated to execute a specific function. All of the arrays are aligned and each row of an array is devoted to one register to keep current state, shadow state and controls for that register. In the exemplary embodiment, eight custom arrays are used to execute various functions of the register mapper. The eight arrays in the example include a CAM storage array, a source match array, a ready bits array, an architected bit array, a destination match array, an allocation array, a free list array and a shadow map array.