The present invention relates to associative processors, and, more particularly, to a combined architecture featuring an associative processor and a random access memory. By combining these two elements the performance of the associative processor associative processor is greatly enhanced and the cost is reduced making the product cost effective for consumer applications, such as digital still cameras, digital camcorders, multi-function peripherals, 2-D and 3-D graphics accelerators, video effects.
In associative processors, each associative word typically contains a data element and all elements are processed in parallel. In applications that require a large number of parameters, such as geometric transformations, the large number of parameters for calculating the transform for each pixel has required very large associative memory words (or several memory words) to process each element. For example, polygon transformation and rendering in 3-D graphics dealing with polygons having three vertices, requires ten parameters per vertex (x,y,z vertex coordinates (30 bits); Nx,Ny,Nz coordinates of the vertex's normal vector (30 bits); R,G,B 24-bit vertex color (24 bits); and an object ID(20 bits)) and three vertices are required for each polygon. This requires a total of 300 bits to store the initial parameters alone. When calculating the transformation, additional temporary storage are required as well for a total of over 1000 bits per polygon. The relatively large die size of associative memory cells makes this solution expensive.
Alternatively, the parameters and temporary results can be stored outside the associative array. This solution too, is less than satisfactory because input and output are done serially for each word and thus consume a large amount of time. It would be highly advantageous to have an architecture capable of storing large amounts of data required for processing in standard, non-associative memory cells while providing parallel access and communication between these cells and the associative array.
By contrast, if the polygon transformation is executed on an architecture in which temporary results and parameters are stored in standard memory cells in parallel communication with the associative memory cells, high performance is maintained using only few associative memory cells to perform the actual calculations.
Furthermore, an entire image can be stored in a very small random access memory (such as DRAM) die size. By providing massively parallel communication between the associative processor array and the random access memory, via the tags unit, a super-high bandwidth is achieved between the associative processor array and the image memory that eliminates the I/O bottleneck that restricts the performance of other processors. For example, an associative array of 8,192 associative memory words using a tags unit which is in parallel communication with all associative words (the tags unit will therefore typically store 8,192 bits) would provide a communication bus of 8,192 bits. At a clock rate of 100 MHz, this provides bandwidth of 100 giga-byte per second. There is thus a widely recognized need for, and it would be highly advantageous to have, an associative memory processor architecture that is in parallel communication with standard random access memory.