1. Technical Field
Embodiments of the present invention relate generally to data processing system memory resources and more particularly to a method and system for emulating content-addressable memory primitives.
2. Description of the Related Art
In a conventional data processing system, data and instructions are stored within memory storage elements arranged in a hierarchical structure. In a typical hierarchical memory or storage structure, smaller, faster memory elements are located closer (in terms of physical structure) and more tightly coupled (communicatively) to processing elements (e.g., processors or processor cores) and store a subset of data and/or instructions stored in larger, slower memory elements (e.g., fixed or removable magnetic or optical disks, tape storage, or the like) elsewhere within or coupled to the data processing system. One type of memory element used frequently in data processing systems for so-called “main” or system memory is random-access memory.
In a conventional random-access memory or “RAM” element, data values are stored in an array of addressed memory locations. To perform a read operation on a RAM element, an address (e.g., a data processing system memory address) is applied to the RAM element, causing data stored at the applied address to be accessed and presented by the RAM.
In order to determine whether a particular data value is stored within a RAM element, an address-based data searching method is performed in which data values are sequentially read out from the RAM and compared with the searched-for data value. Specifically, a series of addresses are sequentially transmitted to an address port of the RAM, thereby causing data values to be read out from the memory locations addressed. A separate comparator element is then used to compare each of the output data values with the searched-for data value, generating a signal when a match occurs. When a large number of data values is to be searched or compared, such address-based search operations are very time consuming as only a single data value is typically processed each clock cycle.
Another type of memory element used in data processing systems to perform data search or comparison operations is content-addressable memory. In a content-addressable memory (CAM) element, a data value may be searched by content, rather than address. In a conventional CAM, data values are stored such that each data value is assigned to a row or column of an array of storage cells. To determine whether a particular data value is stored in a CAM element, a content-based data match operation is performed in which a searched-for data value is simultaneously compared with all rows/columns containing the pre-loaded data values. When one or more of the pre-loaded data values matches the searched-for data value, a “match” signal is generated by the CAM element, along with an address indicating the storage location (i.e., row or column) of the matching pre-loaded data value.
By simultaneously comparing searched-for data with several pre-loaded data values, a CAM element is able to perform comparison or matching operations involving several pre-loaded data values in a single clock cycle. Consequently, CAM elements significantly reduce the time needed to identify a specific data value within a large amount of data as compared with conventional RAM elements and are used frequently for search or pattern-matching-intensive applications. FIG. 1 illustrates a data processing system processor cache as one such exemplary application. The depicted data processing system processor cache 100 includes a cache memory element 102 coupled to a cache tag lookup element 104 as shown. For purposes of illustration, data processing system processor cache 100 will be described herein as a 32 kilobyte cache having 1024 lines of 32 bytes each and organized as a 64-way set associative cache including 16 sets.
Data processing system processor cache 100, as depicted in FIG. 1, is addressed using a 32 bit address with bits 0 through 4 (represented as bits 4:0) identifying a specific byte within a cache line, bits 5 through 8 (represented as bits 8:5) identifying a specific set of the 16 possible sets, and remaining bits (represented as bits 31:9) making up a “cache tag” which identifies a block-frame address in memory where the cache line resides. In a conventional cache memory, a cache tag is used to verify that the cache line addressed in fact stores the requested data.
Cache tag lookup element 104 includes a number of CAM elements 106A, 106B, . . . 106N coupled to a multiplexer 108 and to a match indication signal generation element, (e.g., OR gate 110). In the illustrated data processing system processor cache 100, 16 CAM elements 106 (one for each cache “set”) are employed. In operation, a cache tag (e.g., bits 31:9 of a data processing system memory address) is applied simultaneously to each of CAM elements 106A-106N. Each of CAM elements 106A-106N in the illustrated processor cache is a 64×23 CAM including 64 23-bit registers (not illustrated) coupled to 64 comparators which are in turn coupled to encoding logic (not illustrated). Within the present description, the variable “N” is intended to indicate some positive integer value and need not indicate the same value consistently.
The encoding logic (not illustrated) of each of CAM elements 106 is used to generate a 6-bit address corresponding to a matching CAM element record. Each CAM element 106 is additionally coupled to OR gate 110 to generate a match indication signal indicating whether or not a matching record was identified. Each 6-bit CAM element address generated is then applied, along with a cache set index (e.g., bits 8:5 of the received data processing system memory address), to multiplexer 108. Multiplexer 108 outputs a selected input 6-bit address specified by the received cache set index. The output of multiplexer 108 is then combined/concatenated with the cache set index to form a 10-bit cache memory element address as shown. The generated 10-bit cache memory element address is then used to address or identify a 256-bit line or “block” within cache memory element 102.
While CAM elements are well-suited for performing comparison operations such as those required by cache tag lookup element 104, CAMs may not be implemented in some cases or may be prohibitively expensive in some cases where they may otherwise be used. One technique for providing basic CAM functionality or “primitives” is to emulate the operation of a CAM element using one or more RAM elements. FIG. 2 illustrates a conventional RAM-based emulated CAM. The “virtual” or emulated CAM element 200 depicted in FIG. 2 may be substituted for one of the individual CAM elements 106A-106N illustrated in FIG. 1 and its functionality, as applied for performing a portion of a cache tag lookup operation, will be described with respect to data processing system processor cache 100 of that figure. As represented in FIG. 2, emulated CAM 200 includes RAM elements 202A, 202B, and 202C coupled with combinatorial (e.g., AND gate 204 and OR gate 206) and encoding (e.g., encoder 208) logic as shown.
Each RAM element 202 may be viewed as a 2-dimensional array of bits including rows corresponding to each of the 64 ways of a cache memory element being accessed. Each row stores match reference data for a portion of a cache tag associated with the row's way and represented as a vector of bits. Accordingly, a 7-bit cache tag portion is represented using a 27-bit vector “one-hot” encoded to indicate, using a single hit value, which of the 128 possible cache tag portion permutations is stored within that row/way. Similarly, an 8-bit cache tag portion is represented using a 28 or 256-bit vector.
In operation, emulated CAM 200, and associated RAM elements 202, are utilized to perform a “split” lookup function in which separate portions (e.g., a 7-bit portion and 2 8-bit portions) of a cache tag are each used to address a corresponding one of RAM elements 202. A match of a complete cache tag is indicated if each portion of the cache tag matches in the same way of each of RAM elements 202 and consequently of the cache. For purposes of clarity, illustration of a write port (and a corresponding description of a write operation) has been omitted from emulated CAM 200 of FIG. 2. Each cache tag portion identifies or addresses a single bit position which may be viewed a column within an accessed RAM element including a bit from each way. All 64 bits of that bit position or column are then output to determine whether the provided or input partial cache tag value matched reference partial cache tag data within the corresponding portion of the emulated CAM (indicated, for example, by a single bit having a logical “1” value).
The 64-bit outputs of each of RAM elements 202A-202C are then logically combined or “joined” via a bitwise AND operation using AND gate 204. The combined 64-bit output is then used to generate a 6-bit match address corresponding to a matching location within emulated CAM element 200 (e.g., using encoder 208) and to generate a match indication signal indicating whether or not a matching record was identified (e.g., using OR gate 206) as shown. If none of the bits of the bitwise-coalesced RAM element output is set to a logical “1” value, a determination may be made that the complete 23-bit input cache tag failed to match an emulated CAM element entry for a single way of an associated cache memory element. Once the 6-bit match address and match indication signal have been generated, they may be applied, along with a CAM selection (e.g., cache set) index, to a multiplexer (e.g., multiplexer 108 of FIG. 1) and used to address a cache memory element as previously described herein.
While RAM element-based CAM emulation may be utilized in some circumstances where traditional CAMs may not, providing greater flexibility and cost-effectiveness, one significant problem associated with such CAM emulation techniques is the quantity of RAM memory required for implementation. This problem, although more prominent where emulation of a CAM element is embodied in a single RAM, is evident even where emulation is distributed across multiple RAM elements as depicted in FIG. 2. For example, for a 32 kilobyte cache as described herein, at least 640 kilobits of RAM storage (approximately 40 kilobits of RAM storage for each of a total of 16 emulated CAM elements) is required.
Additional RAM storage may also be required as a buffer for match reference data used to update or modify match reference data within the emulated CAM's RAM elements. Moreover, logic used in conventional CAM-based implementations (e.g., a multiplexer and an additional OR gate used to generate a global match indication signal) is not eliminated in such CAM-emulation systems. In some circumstances this quantity of memory and logic is unacceptable and consequently a cache may be omitted or implemented in a less-than-optimal way.