The problem of string searching is to identify the appearance of an r-character target vector P[i] where i=1,2, . . . r constructed from a vocabulary of m distinct characters anywhere in an n-character candidate data base S[j], where j=1,2,3, . . . n. For typical applications r&lt;&lt;n and m&lt;&lt;n. Each of the characters comprising P[i] and S[j] is an alphanumeric character or grammatical symbol etc. A typical example of a string search might be to find the target vector "filters" in a candidate data base represented by a bit stream of the form "xxxxfile, filtersxxxx."
A variety of different software and hardware algorithm for searching large data bases have been proposed. (See e.g. Curry, T. and Mukhopadhyay, A., "Realization of Efficient Non-Numeric Operations Through VLSI," Proceedings of VLSI '83, 1983; Foster, M. J. and Kung, H. T., "The Design of Special Purpose Chips", Computer Magazine 13(1): 26-40, January, 1980; Haskin, R. L. and Hollaar, Lee A., "Operational Characteristics of a Hardware-Based Pattern Matcher," ACM Transactions on Database Systems, Vol. 8, No. 1, March 1983, pages 15-40; Mead, C. A., Pashley, Richard D., Britton, Lee D. Daimon, Yoshiaki T., and Sando, Jr. Steward F., "128-Bit Multicomparator", IEEE Journal of Solid-Stated Circuits, Vol. SC-11, No. 5, October 1976; Pramanik, Sakti, "Performance Analysis of a Database Filter Search Hardware", IEEE Transaction on Computers, Vol. C-35, No. 12, December 1986; Takahashi, K., Yamada, H., Nagai, H., and Hirata, M., "Intelligent String Search Processor Accelerate Text Information Retrieval," 5th International Workshop on Database Machines, Tokyo, Japan, 1987, page 440-453).
The search speeds of these existing algorithms are limited because the characters in the data base to be searched are examined sequentially. For example, in the Curry et al. reference identified above, the target vector P[i], i=1,2,3. . . r, is loaded into array of r comparators and the bytes in the candidate data base are shifted or broadcast sequentially through the comparator array in a pipelined fashion. Such an approach to string searching is throughput limited by the propagation delay of each stage in the pipeline, which is in turn limited by the comparison rate of the individual comparators. Existing comparator array approaches to string searching also require that every byte or character in S[j] be tested against every byte or character in P[i] even if the result of a comparison is redundant with previous comparisons. For example if P[i=1] does not equal S[j=1], then the comparison of P[i=2] with S[j=2] is unnecessary since a string in the data base S[j],j=1, . . . n, matching P[i], i=1,2, . . . r, r&lt;&lt;n, cannot begin at S[j=1]. Thus, existing comparator array approaches make poor use of comparator resources.
In view of the above, it is an object of the present invention to provide a parallel algorithm for searching data bases with an improved comparison efficiency. More particularly, it is an object of the present invention to provide a string search algorithm for searching data bases that makes better use of comparator resources than prior string search algorithms.