The invention relates to Content Addressable Memory (CAM) circuits. The invention particularly relates to methods for implementing CAM functions using dual-port Random Access Memory (RAM) circuits.
RAM circuits are well-known data storage devices that store data values in an array of addressed memory locations. To determine whether a particular data value is stored in a RAM, an address-based data searching method is performed in which data values are sequentially read out from the RAM and compared with the particular data value. Specifically, a series of addresses are transmitted to an address port of the RAM, thereby causing data values to be read out from the memory locations associated with the addresses and transmitted to an output port of the RAM. A separate comparator circuit is then used to compare each of the output data values with the searched-for data value, and to generate a signal when a match occurs. When a large number of data values is searched, such address-based search operations are very time consuming because only one data value is searched/compared each clock cycle.
CAM circuits are a second type of data storage device in which a data value is searched for by its content, rather than by its address. Data values are stored (pre-loaded) in CAM circuits such that each data value is assigned to a row or column of an array of CAM cells. To determine whether a particular data value is stored in the CAM circuit, a content-based data match operation is performed in which the searched-for data value is simultaneously compared with the rows/columns containing the pre-loaded data values. When one or more of the pre-loaded data value matches the searched-for data value, a xe2x80x9cmatchxe2x80x9d signal is generated by the CAM circuit, along with an address indicating the storage location (i.e., row or column) of the pre-loaded data value. By simultaneously comparing the searched-for data value with several pre-loaded data values, a CAM circuit is able to perform compare-and-match (hereafter xe2x80x9cmatchxe2x80x9d) operations involving several pre-loaded data values in a single clock cycle. Therefore, when compared with RAM circuits, CAM circuits significantly reduce the search time needed to locate a particular data value from a large number of data values.
PLDs are integrated circuits that typically include user-configurable circuitry that is controlled by configuration data to implement a user""s logic function. The user-configurable circuitry typically includes general-purpose logic resources (e.g., look-up tables), special-purpose logic resources (e.g., RAM circuits), and interconnect resources that are connected between the general-purpose and special purpose logic resources. To program a PLD, a user typically enters a desired logic function into a Personal Computer (PC) or workstation that is configured to run one or more place-and-route software programs. These place-and-route software programs then generate a configuration solution by assigning portions of the logic function to specific logic resources of the PLD, and allocating sections of the interconnect resources to form signal paths between the logic resources, thereby causing the PLD to emulate the desired logic function. The configuration solution generated by the place-and-route software is then converted into a bitstream that is transmitted into the configuration memory of the PLD.
Early PLDs could not support on-chip CAM functions, and external dedicated CAM circuits were required. These dedicated CAM circuits were connected to the input/output (I/O) terminals of the PLDs, and CAM functions were performed in conjunction with PLD operations by transmitting information between the PLD and the dedicated CAM circuit. A problem with this arrangement is that it results in relatively slow operation speeds, and requires the use of precious PLD I/O resources that typically limits the complexity of other logic functions implemented in the PLD. Therefore, there is a demand for PLDs that perform on-chip CAM functions in order to speed up CAM operations and free-up PLD I/O resources.
More recently, advanced PLDs have been produced with dedicated CAM circuits that provide on-chip PLD CAM functions. For example, APEX(trademark) 20KE devices, produced by Altera(copyright) Corporation, include special-purpose CAM circuits in addition to general-purpose logic resources and other special-purpose logic resources (e.g., RAM circuits).
A problem with including dedicated CAM circuitry on PLDs is that the CAM circuitry is essentially useless unless a user""s logic function implements a CAM function. That is, unlike general-purpose logic circuitry, dedicated conventional CAM circuitry typically cannot be used for non-CAM logic functions. Therefore, the dedicated CAM circuitry remains idle when a user""s logic function does not include a CAM function, and takes up die space on the PLD that could otherwise be used for logic operations.
Another problem with including dedicated CAM circuitry on PLDs is the conflict between the amount of die space required for the CAM circuitry and the range of CAM functions that can be implemented by the CAM circuitry (i.e., the flexibility of the CAM circuitry). A relatively simple CAM circuit requires relatively little die space, but is less likely to support a wide range of CAM functions (i.e., has little flexibility). On the other hand, a sophisticated CAM circuit is more likely to support a wide range of CAM functions, but requires a large amount of die space, thereby reducing the number of general-purpose logic resources provided on the PLD. Therefore, a PLD manufacturer must balance the flexibility of the CAM circuit with the amount of die space occupied by the CAM circuitry. Typically, such choices result in CAM features that are less than optimal. For example, the dedicated CAM circuitry provided in APEX(trademark) 20KE devices only supports single clock cycle CAM operations to data words having widths of 32-bits or less.
What is needed is a method of implementing CAM functions without requiring dedicated, special-purpose CAM circuitry, thereby overcoming the problems described above.
The present invention provides methods for implementing a CAM function in one or more dual-port RAM circuits (referred to herein as xe2x80x9cdual-port RAMsxe2x80x9d). In effect, the present invention extends the range of functions that can be implemented by a dual-port RAM to include CAM functions. The methods are particularly useful when implemented in PLDs because they eliminate the need for dedicated CAM circuitry that is provided in some PLDS, thereby freeing more IC area for general purpose logic circuitry. Further, the methods described herein can be applied to multiple dual-port RAMs, thereby providing very wide and deep CAM functions that can be performed in a single clock cycle. Also provided is a PLD including a dual-port RAM that is configured in accordance with the present invention to implement CAM functions.
Dual-port RAMs, which are utilized to perform the methods of the present invention, typically include an array of memory cells arranged in rows and columns, and first and second input ports that independently access the memory array through a row decoder and a column decoder (which can be functionally combined to form a single decoder). In one embodiment, the first input port is used during CAM write and erase operations, and the second input port is used during CAM data match operations.
According to a first main aspect of the present invention, a dual-port RAM is utilized to implement CAM functions by storing decoded xe2x80x9cone hotxe2x80x9d data words in the columns of the RAM memory array, and then performing data match operations by reading selected rows of the RAM memory array. As used herein, each decoded xe2x80x9cone hotxe2x80x9d data word includes only one logic xe2x80x9c1xe2x80x9d bit (all other bits are logic xe2x80x9c0xe2x80x9d), and the decimal value of each decoded xe2x80x9cone hotxe2x80x9d data word is defined by the bit position of the logic xe2x80x9c1xe2x80x9d bit. For example, an eight-bit decoded xe2x80x9cone hotxe2x80x9d data word can have a decimal value between xe2x80x9c0xe2x80x9d (i.e., 00000001) and xe2x80x9c7xe2x80x9d (i.e., 10000000), depending upon the position of the logic xe2x80x9c1xe2x80x9d bit. In one embodiment, each decoded xe2x80x9cone hotxe2x80x9d data word is stored in one column of the RAM memory array. Accordingly, a group of decoded xe2x80x9cone hotxe2x80x9d data words can be simultaneously compared with a match data word by transmitting the encoded match data word to the RAM row decoder, and reading a selected row of memory cells that corresponds to the decoded match data word. A match is detected when one or more of the memory cells in the corresponding row includes at least one logic xe2x80x9c1xe2x80x9d bit value.
In accordance with a second main aspect of the present invention, the first input port of the dual-port RAM is configured to automatically write (or erase) decoded xe2x80x9cone hotxe2x80x9d data words into (or from) the memory array by accessing a selected memory cell in response to an X+Y-bit word. The encoded X+Y-bit word is transmitted to an address terminal of the first input port, and a logic bit value (i.e., logic xe2x80x9c1xe2x80x9d during write operations, and logic xe2x80x9c0xe2x80x9d during read operations) is transmitted to a data input terminal of the first data port. A Y-bit (write address) portion of the X+Y-bit word is decoded by the column decoder of the dual-port RAM, thereby accessing the column in which the selected memory cell is located. An X-bit (write data) portion of X+Y-bit word is decoded by the row decoder of the dual-port RAM, thereby accessing the row in which the selected memory cell is located. Stated differently, as accessed through the first input port, the memory array is a single xe2x80x9ccolumnxe2x80x9d (or xe2x80x9crowxe2x80x9d) of memory cells, and a selected memory cell is accessed by applying the X+Y-bit word to a xe2x80x9crowxe2x80x9d (or xe2x80x9ccolumnxe2x80x9d) decoder of the dual-port RAM. During write operations, a logic xe2x80x9c1xe2x80x9d bit value applied to the data input terminal of the first input port is then written into the selected memory cell. Because each decoded xe2x80x9cone hotxe2x80x9d data word includes only one logic xe2x80x9c1xe2x80x9d bit value, write operations are performed during a single clock cycle (assuming a memory array initialized to all logic xe2x80x9c0xe2x80x9d). During erase operations, a logic xe2x80x9c1xe2x80x9d bit value stored in the selected memory cell is overwritten by a logic xe2x80x9c0xe2x80x9d bit value applied to the data input terminal of the first input port. Again, because each decoded xe2x80x9cone hotxe2x80x9d data word includes only one logic xe2x80x9c1xe2x80x9d bit value, write operations are performed in a single clock cycle.
According to yet another aspect of the present invention, the second input port of the dual-port RAM is configured to read bit values stored in the memory cells of one row of the memory array in response to an X-bit encoded match data word that is applied during data match operations. The X-bit match data word is decoded by the row decoder to access a corresponding row of the memory array. Each decoded xe2x80x9cone hotxe2x80x9d data word is stored in one memory array column of the dual-port RAM, and the decimal value of each decoded xe2x80x9cone hotxe2x80x9d word is determined by the row in which the logic xe2x80x9c1xe2x80x9d is stored. For example, a dual-port RAM having a 16xc3x97256 memory array (e.g., sixteen columns and 256 rows) can store up to sixteen decoded xe2x80x9cone hotxe2x80x9d data words that have decimal values of 0 to 255. A match data word can be compared with all sixteen decoded xe2x80x9cone hotxe2x80x9d data words simultaneously by applying the match word to the row decoder of the dual-port RAM, and reading the corresponding row that is addressed by the row decoder. In other words, if a logic xe2x80x9c1xe2x80x9d appears in any of the sixteen bits read from the corresponding row of the memory array, then one of the decoded xe2x80x9cone hotxe2x80x9d data words matches the match data word. The address of the decoded xe2x80x9cone hotxe2x80x9d data word is determined by the bit position of the logic xe2x80x9c1xe2x80x9d bit value in the output word (i.e., which identifies the column storing the matching xe2x80x9cone hotxe2x80x9d data word). Multiple matches are identified when more than one logic xe2x80x9c1xe2x80x9d bit value appears in the output word.
The present invention is particularly useful when implemented in Programmable Logic Devices (PLDs) because it allows a PLD to support CAM functions without the need for dedicated CAM circuitry, thereby providing additional die space for general-purpose logic resources on the PLD.
In a first disclosed example, a PLD is configured to include an encoded data memory for storing encoded data values that are used to write decoded xe2x80x9cone hotxe2x80x9d data words in the memory array of a dual-port RAM in accordance with the methods described above. During subsequent erase operations, the encoded data words stored in encoded data memory are read out and used to erase the decoded xe2x80x9cone hotxe2x80x9d data words from the memory array. By storing these encoded data words in this manner, all decoded xe2x80x9cone hotxe2x80x9d data words can be erased from the dual-port RAM in a minimum number of clock cycles.
In a second disclosed example, a PLD is configured to illustrate the expandable depth of CAM functions performed in accordance with the present invention. The PLD resources are configured to connect four dual-port RAMs in parallel. An special address decoder is used to enable one of the four dual-port RAMs during write and erase operations, thereby allowing different data values to be written into each of the four dual-port RAMs. During data match operations, a match data word is simultaneously applied to all four dual-port RAMs, thereby simultaneously comparing the match data word to the decoded xe2x80x9cone hotxe2x80x9d data words stored in all four dual-port RAMs. The resulting output words transmitted from each dual-port RAM are combined to form a wide output word. Therefore, the second example illustrates how a PLD can be configured to perform CAM functions in which any number of decoded xe2x80x9cone hotxe2x80x9d data words are compared simultaneously to the match data word such that the output word provides the address of any one of the decoded xe2x80x9cone hotxe2x80x9d data words that matches the match data word. In contrast, PLDs that include a dedicated CAM circuit are limited by the output structure of the CAM circuit.
In a third disclosed example, a PLD is configured to illustrate the expandable width of CAM functions performed in accordance with the present invention. The PLD resources are configured to connect two dual-port RAMs such that each dual-port RAM stores one-half of a data word. During write operations, the bits of each encoded write data word are separated into two groups that are respectively used to store decoded xe2x80x9cone hotxe2x80x9d data words in the two dual-port RAMs. During data match operations, a match data word is similarly separated and compared with the decoded xe2x80x9cone hotxe2x80x9d data words stored in the two dual-port RAMs. Each bit of the output word generated by the first dual-port RAM is then ANDed with a corresponding bit of the output word generated by the second dual-port RAM. A match is detected when both corresponding bits are logic xe2x80x9c1xe2x80x9d, and the address of the corresponding data word is indicated by a logic high output signal from a corresponding AND gate. Accordingly, the third example illustrates how a PLD can be configured to perform CAM functions supporting data words having any width. In contrast, PLDs that include dedicated CAM circuits are limited during single clock cycle match operations by the maximum width supported by the CAM circuit.