The invention relates generally to a memory engine for the inspection and manipulation of data, and more particularly, to a memory engine which not only provides for the fast searching of data, in the form of strings of symbols (characters or the like), but also provides for the selective insertion and deletion of data within the character strings, as required.
Searching a buffer, or other memory device, comprised of symbols for strings that match a given or predetermined string of symbols is a basic operation found in many applications, such as but not limited to databases, the processing of genetic information, data compression, and the processing of computer languages. Modification of a string by inserting new sequences in it, or deleting sequences from it, is also a basic operation in these domains, and the time taken by these string operations influences directly the execution time of the main applications.
When a serial computation is performed, that is, a matching operation, to find all occurrences of strings of N symbols in a buffer containing M symbols, the maximum number of steps required is N*M. When an insertion of a character is necessary inside the buffer, on the average of half of the symbols in the buffer have to be moved one cell to the right or to the left to make room for the new cell. In this case, an average of N/2 steps are required.
Serial algorithms have been proposed to improve these operations, and they are based on several techniques including hashing, or tree data structures. Hashing is used when the strings of interests are words of fixed length. In this case each word is associated with a unique number that is used as the index where that word is stored in a dictionary. This method has the disadvantage that it works well only when the information is static, and does not change location during processing. Furthermore, generating this number is a costly operation, and sometimes several words may be associated with the same number, requiring additional work to find the word sought. Suffix trees may also be utilized and are tree structures in which all the substrings present in the buffer are stored. When one wants to see if a given string is located in the buffer, one only has to descend the tree, one character of the sought string at a time, until the string is either found, or not found. In either case, if the string contains M symbols, at most M steps are required to decide if the string is in the buffer of length L. Although this search method is fast, building the suffix tree is oftentimes computationally expensive.
The Content Addressable Memory, or CAM, is a parallel solution for finding the location of a given symbol or word in a single memory access. This method works well for fixed length words, but does not extend easily to variable length strings of symbols. When the search can be performed in parallel in the buffer, that is when M comparisons can be performed at the same time, then the number of steps is reduced to N. Buffers with parallel comparators and markers storing the result of each comparison with a given symbol have been proposed to speed up string searches. See, for example, Almy et al., U.S. Pat. No. 4,575,818; Mayer, U.S. Pat. No. 5,319,762; Eskandari-Gharnin et al., U.S. Pat. No. 5,602,764; or Satoh, et al., U.S. Pat. No. 5,448,733. These known devices typically associate a comparator with each cell of the buffer, along with a one-bit marker storing the result of the last comparison performed. The comparator, storage cell and marker operate in such a way that a symbol from the string to be located in the buffer is broadcast to all the comparators of the buffer. These comparators in turn compare the given symbol to that stored in their associated storage cell. The result of the comparison is stored in the marker associated with the comparator and storage cell.
Buffers implemented as shift registers allow their contents to be shifted to the left or to the right in parallel, synchronously to a clock signal. In this case the whole contents of the buffer can be shifted in just one step. These buffers, however, do not offer only a section of their contents to be shifted, but offer only global shift operations. Moreover, the integration of separate comparators for each cell of the buffer tends to increase the size and complexity of the device as a whole, thus leading to excessive cost and energy use.
With the forgoing problems and concerns in mind, the present invention therefore seeks to utilize a memory apparatus which allows for very fast character strings searches, insertions and deletions, wherein a new type of memory storage circuit called a Connex Memory (hereinafter, CM) is utilized.
It is an object of the present invention to enable fast string search, insertion, and deletion operations to be performed on data.
It is another object of the present invention to enable fast string search, insertion, and deletion operations to be performed on data comprising a string of data characters.
It is another object of the present invention to enable fast string search, insertion, and deletion operations to be performed on data comprising a string of data characters, wherein the inspection and manipulation of any given data character is accomplished in a single clock cycle.
It is another object of the present invention to enable the inspection of variably-sized data fields within a string of data characters.
It is another object of the present invention to enable the marking of variably-sized data fields within a string of data characters.
It is another object of the present invention to utilize static or dynamic memory cells to temporarily store and manipulate a string of data characters.
It is another object of the present invention to inspect a string of data characters stored in the memory cells in either a forward or reverse direction.
According to one embodiment of the present invention, a memory engine combines associative memory and random-access memory for enabling fast string search, insertion, and deletion operations to be performed on data and includes a memory device for temporarily storing the data as a string of data characters. A controller is utilized for selectively outputting one of a plurality of commands to the memory device and receives data feedback therefrom, the memory device inspects data characters in the string in accordance with the commands outputted by the controller. A clock device is also utilized for outputting a clock signal comprised of a predetermined number of clock cycles per second to the memory device and the controller, the memory device inspecting and selectively manipulating one of the data characters within one of the clock cycles.