The present invention relates to computer architectures and in particular to an associative computer providing improved parallelism.
Widely used von Neumann-type computer architectures, in which one or more processors communicate with a separate memory holding instructions and data, face a technical limitation termed the “von Neumann bottleneck”. The von Neumann bottleneck relates to the communication channel between the processors and the memory required for each transfer of instructions and data and which fundamentally limits computer execution speed.
One solution to this bottleneck may be found in computer architectures using “processing-in-memory” (PIM). Processing-in-memory architectures, as the name implies, endeavor to operate on data without moving the data out of memory into the processors and thus avoid the von Neumann bottleneck.
One variant of these latter architectures is the associative processor (AP). An associative processor is constructed using an associative memory of a type permitting parallel searching and writing to multiple memory words. This searching and writing can be used to implement operations on the data in the memory without transferring that data. Generally, an associative processor operates in parallel on multiple words in memory each holding two operands. The associative processor sequentially applies search patterns to these operands where each search pattern represents an operand pattern consistent with a particular operation result. As particular operand patterns are identified, the corresponding results may be written to the identified words. By operating in parallel on each of these operands, and associative processor may implement high-speed “single instruction, multiple data” (SIMD) processing.
For example, a two-bit addition (without carry-in) can be decomposed into four basic patterns of operands (00, 01, 10, and 11) each associated with a particular result (0, 1, 1, 0 with carry). Four successive searches maybe conducted simultaneously on all the words of the memory for these four basic patterns. Once a pattern is found, the corresponding result is written to that word effectively computing the operation results for each word.
In practice, the number of patterns that must be searched increases with the consideration of carry-in bit Cin; however, this increased number of patterns can then be further reduced by eliminating patterns that do not change a default result value (e.g., zero) limiting the total number of patterns that need to be considered for addition to around five. The time required to test for and write the results for each pattern of an operation is more than offset by the extremely large number of words that can be simultaneously processed.
New generations of content addressable memories employing, for example, phase change memory or resistive memory elements make associative processors more attractive on a cost basis. Such content addressable memories are currently used for network switches and the like which require high-speed network address lookup, or applications such as machine learning. As required for associative processing, these content addressable memories provide multiple words of storage that can be simultaneously searched or simultaneously written to. Each word is associated with a tag bit that is set when a search applied to that word matches. The tag bits can be used to control subsequent write operations so that data words can be changed based on the results of the previous search.
An associative processor using a content addressable memory may simultaneously search its data words for operands matching successive patterns of an operation. Where matches occur, tag bits are set and used to control a subsequent writing of results to those matching data words.