Searching a buffer, or other memory device, comprised of symbols for strings that match a given or predetermined string of symbols is a basic operation found in many applications, such as but not limited to databases, the processing of genetic information, data compression, and the processing of computer languages. Modification of a string by inserting new sequences in it, or deleting sequences from it, is also a basic operation in these domains, and the time taken by these string operations influences directly the execution time of the main applications.
When a serial computation is performed, that is, a matching operation, to find all occurrences of strings of N symbols in a buffer containing M symbols, the maximum number of steps required is N*M. When an insertion of a character is necessary inside the buffer, on the average of half of the symbols in the buffer have to be moved one cell to the right or to the left to make room for the new cell. In this case, an average of N/2 steps are required.
Serial algorithms have been proposed to improve these operations, and they are based on several techniques including hashing, or tree data structures. Hashing is used when the strings of interests are words of fixed length. In this case each word is associated with a unique number that is used as the index where that word is stored in a dictionary. This method has the disadvantage that it works well only when the information is static, and does not change location during processing. Furthermore, generating this number is a costly operation, and sometimes several words may be associated with the same number, requiring additional work to find the word sought. Suffix trees may also be utilized and are tree structures in which all the substrings present in the buffer are stored. When one wants to see if a given string is located in the buffer, one only has to descend the tree, one character of the sought string at a time, until the string is either found, or not found. In either case, if the string contains M symbols, at most M steps are required to decide if the string is in the buffer of length L. Although this search method is fast, building the suffix tree is oftentimes computationally expensive.
The Content Addressable Memory, or CAM, is a parallel solution for finding the location of a given symbol or word in a single memory access. This method works well for fixed length words, but does not extend easily to variable length strings of symbols. When the search can be performed in parallel in the buffer, that is when M comparisons can be performed at the same time, then the number of steps is reduced to N. Buffers with parallel comparators and markers storing the result of each comparison with a given symbol have been proposed to speed up string searches. See, for example, Almy et al., U.S. Pat. No. 4,575,818; Mayer, U.S. Pat. No. 5,319,762; Eskandari-Gharnin et al., U.S. Pat. No. 5,602,764; or Satoh, et al., U.S. Pat. No. 5,448,733. These known devices typically associate a comparator with each cell of the buffer, along with a one-bit marker storing the result of the last comparison performed. The comparator, storage cell and marker operate in such a way that a symbol from the string to be located in the buffer is broadcast to all the comparators of the buffer. These comparators in turn compare the given symbol to that stored in their associated storage cell. The result of the comparison is stored in the marker associated with the comparator and storage cell.
Buffers implemented as shift registers allow their contents to be shifted to the left or to the right in parallel, synchronously to a clock signal. In this case the whole contents of the buffer can be shifted in just one step. These buffers, however, do not offer only a section of their contents to be shifted, but offer only global shift operations. Moreover, the integration of separate comparators for each cell of the buffer tends to increase the size and complexity of the device as a whole, thus leading to excessive cost and energy use.
With the forgoing problems and concerns in mind, the present invention therefore seeks to utilize a memory apparatus which allows for very fast character strings searches, insertions and deletions, wherein a new type of memory storage circuit called a Connex Memory (hereinafter, CM) is utilized.
In particular, the present invention proposes a Connex Memory device that operates in the manner of an associative memory, yet which includes a flexability not heretofore known in associative memory devices.
Known data processing systems most often utilize conventionally addressed memory devices. That is, known data systems utilize memory devices which include defined locales therein, each locale having its own particularized address. In this manner, should a system processor desire to add the value stored at address A with the value stored at address B, the conventional memory device will proceed to the specific, addressed locations, or cells, within the memory device, and communicate these values, via an interface, to the processor where the appropriate summation can occur. In such systems, the nature and capability of the integral components, that is, the nature and capabilities of the processor and the memory devices, are well defined and distinct from one another.
It is also known that data processing systems may include more than one processor and memory device, and further, that these multiple components may be part of a system that executes multiple streams of instructions. These multiple instruction streams, multiple data streams (MIMD) devices can be viewed as large collections of tightly coupled SISD devices where each processor in the system, although operating in overall concert with the other integrated processors, is responsible for a specific portion of a greater task. That is, the effectiveness of MIMD devices is typically limited to those specified arenas where the problem to be solved lends itself to being parsable into a plurality of similar and relatively independent sub-problems. The nature and capabilities of those integral components of MIMD devices are also well defined and distinct from one another.
Another known data processing system involves single instruction, multiple data streams (SIMD) devices. These SIMD devices utilize an arbitrary number of processors which all execute, in sync with one another, the same program, but with each processor applying the operator specified by the current instruction to different operands and thereby producing its own result. The processors in a SIMD device access integrated memory devices to get operands and to store results. Once again, the nature and capabilities of those integral components of a SIMD device are well defined and distinct from one another in that computations are executed by the processors that must have some type of access to a memory device to do their job.
While known data processing systems are therefore capable of processing large amounts of data, the defined and unchanging nature of the processors and memory devices limits the speed and efficiency at which various operations may be completed.
Thus, various architectures have also been constructed which utilize another class of memory devices which are not conventionally addressed. These memory devices are typically described as being ‘associative’ memory devices and, as indicated, do not catalog their respective bits of data by their physical location within the memory device. Rather, associative memory devices ‘address’ their data bits by the nature, or intrinsic quality, of the information stored therein. That is, data within associative memory devices are not identified by the name of their physical locations within the memory device, but rather from the properties of the data stored in each particular cell of the memory device.
A key field of a fixed, or limited, size is attached to all data stored in most associative memory devices. A search key may then be utilized to select a specific data field, or plurality of data fields, whose attached key field(s) match the search key, irrespective of their named location, for subsequent processing in accordance with directed instructions.
In these known associative memory devices, all data stored therein includes a key field and a corresponding data field. Known associative searching and data-manipulation techniques examine the key fields within an associative memory to determine which of the corresponding data fields includes data of particular interest for a predetermined command. Once identified, any data fields of continued interest may be ‘tagged’ or ‘marked’ by changing the state of one or more bits within the appropriate key field, thus leaving the associated data fields primed for subsequent searching or manipulative commands.
Therefore, known associative memory devices rely upon a controller capable of observing the content of key fields so as to identify those data fields of interest. While these known associative systems are useful to a certain degree, they still suffer from a lack of flexibility in the searching architecture, and thus their utility is correspondingly diminished while their processing time is increased.
With the foregoing problems and concerns in mind, the present invention seeks to increase the flexibility of known associative memory devices so as to increase their utility while simultaneously reducing their processing times.