Traditionally, integrated circuit processors are designed either as general purpose microprocessors, as application specific integrated circuits (ASIC's) or as reconfigurable logic circuits. The integrated circuit processors transfer data from memory through a tightly coupled memory interface. A general purpose microprocessor transfers data by following arbitrary sequences of microprocessor instructions defined by a user written program. This provides flexibility but decreases performance because the circuitry is not optimized for any specific application. An ASIC is designed by describing its structure in terms of circuit primitives such as Boolean gates and registers. The circuit designer arranges the circuit primitives so as to optimize performance for a specific application (such as video compression or audio decoding). While an ASIC provides high performance, its fixed architecture cannot be changed after fabrication to adapt to new algorithms or changing standards. Additionally, the high development costs and lengthy design cycle are not suited to rapidly developing markets.
Reconfigurable logic circuits (also known as programmable logic circuits) typically include user-configurable circuitry that is controlled by configuration data to implement a user's logic function. The user-configurable circuitry typically includes general-purpose logic resources (e.g., look-up tables), special-purpose logic resources (e.g., RAM circuits), and interconnect resources that are connected between the general-purpose and special purpose logic resources. To configure (or program) a circuit, a user typically enters a desired logic function into a Personal Computer (PC) or workstation that is configured to run one or more place-and-route software programs. These place-and-route software programs then generate a configuration solution by assigning portions of the logic function to specific logic resources of the circuit, and allocating sections of the interconnect resources to form signal paths between the logic resources, thereby causing the circuit to emulate the desired logic function. The configuration solution generated by the place-and-route software is then converted into a bit-stream that is transmitted into the configuration memory of the circuit.
Reconfigurable logic circuits may be used, for example, as hardware accelerators to perform computationally expensive data processing tasks. To enable fast computation, an efficient memory interface is required. The memory interface may also be implemented using reconfigurable logic. For example, for certain applications in which a large amount of ordered data (which is typically stored in a regular memory pattern such as a vector, a two-dimensional shape, or a link list) is to be accessed in memory or transferred in real-time from a peripheral, a streaming architecture may be used. Processing such ordered data streams is common in media applications, such as digital audio and video, and in data communication applications (such as data compression or decompression). In many applications, relatively little processing of each data item is required, but high computation rates are required because of the large amount of data. Processors and their associated memory interface are conventionally designed with complex circuits that attempt to dynamically predict the data access patterns and pre-fetch required data. This approach is typically limited in performance because data access patterns are difficult to predict correctly for many cases. In addition, the associated circuits consume power and chip area that can otherwise be allocated to actual data processing.
In general, data may be stored in non-contiguous memory locations. This complicates the task of memory access. For example, in image processing applications, a sub-tile from an image may occupy a non-contiguous region of memory. In this case, and in some other applications, the memory locations may be predictable from a small set of parameters via simple arithmetic calculations. In these cases, stream parameters (such SPAN, STRIDE, SKIP etc.), may be used in conjunction with custom logic circuits (fixed or reconfigurable) to generate consecutive memory addresses. However, implementation of a parameter-based streaming interface on a reconfigurable logic such as an FPGA utilizes a larger number of FPGA slices, and does not make efficient use of limited FPGA resources.
In other applications, it is possible to find a small set of parameters that describe the data memory locations. Some application specific integrated circuits (ASIC's) use a content addressable memory (CAM). In a CAM circuit, a data value is searched for by its content, rather than by its address. Data values are stored (pre-loaded) in CAM circuits such that each data value is assigned to a row or column of an array of CAM cells. To determine whether a particular data value is stored in the CAM circuit, a content-based data match operation is performed in which the searched-for data value is simultaneously compared with the rows/columns containing the pre-loaded data values. When one or more of the pre-loaded data values match the searched-for data value, a “match” signal is generated by the CAM circuit, along with an address indicating the storage location (i.e., row or column) of the pre-loaded data value. By simultaneously comparing the searched-for data value with several pre-loaded data values, a CAM circuit is able to perform compare-and-match (hereafter “match”) operations involving several pre-loaded data values in a single clock cycle. Therefore, when compared with RAM circuits, CAM circuits significantly reduce the search time needed to locate a particular data value from a large number of data values.
Early reconfigurable logic circuits did not support on-chip CAM functions, and external dedicated CAM circuits were required. These dedicated CAM circuits were connected to the input/output (I/O) terminals of the circuit, and CAM functions were performed in conjunction with circuit operations by transmitting information between the programmable logic device (PLD) and the dedicated CAM circuit. A problem with this arrangement is that it results in relatively slow operation speeds, and requires the use of precious reconfigurable logic circuit's I/O resources that typically limits the complexity of other logic functions implemented in the reconfigurable logic circuit. Therefore, there is a demand for reconfigurable logic circuits that perform on-chip CAM functions in order to speed up CAM operations and free-up reconfigurable logic circuits I/O resources.
More recently, advanced reconfigurable logic circuits have been produced with dedicated CAM circuits that provide on-chip CAM functions. A problem with including dedicated CAM circuitry in a reconfigurable logic circuit is that the CAM circuitry is essentially useless unless a user's logic function implements a CAM function. That is, unlike general-purpose logic circuitry, dedicated conventional CAM circuitry typically cannot be used for non-CAM logic functions. Therefore, the dedicated CAM circuitry remains idle when a user's logic function does not include a CAM function, and takes up die space on the device that could otherwise be used for logic operations.
Another problem with including dedicated CAM circuitry in a reconfigurable logic circuit is the conflict between the amount of die space required for the CAM circuitry and the range of CAM functions that can be implemented by the CAM circuitry (i.e., the flexibility of the CAM circuitry). A relatively simple CAM circuit requires relatively little die space, but is less likely to support a wide range of CAM functions (i.e., has little flexibility). On the other hand, a sophisticated CAM circuit is more likely to support a wide range of CAM functions, but requires a large amount of die space, thereby reducing the number of general-purpose logic resources provided on the PLD. Therefore, a device manufacturer must balance the flexibility of the CAM circuit with the amount of die space occupied by the CAM circuitry. Typically, such choices result in CAM features that are less than optimal.
Static CAM device, such as those used on ASIC's, are not suitable for use in reconfigurable logic circuits, since they do not allow dynamic reprogramming.
Reconfigurable logic circuits often include random access memory (RAM) or block random access memory (BRAM). Techniques for using the dual-port block RAM in reconfigurable devices to implement CAM functions have been disclosed. However, these techniques are not optimized for dynamic content update, which is required for dynamic out-of-order access (DOOA).
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.