This invention relates generally to data searching techniques and, more particularly, to improvements in the construction and functionality of data searching machines capable of handling large amounts of data. The state of the art of data searching techniques is probably typified by the disclosure of U.S. Pat. No. 4,760,523 to Yu et al., which is assigned to the same assignee as the present application. The Yu et al. patent discloses and claims a text searching technique in which a large database is streamed through a serially connected string of cells containing a pattern to be searched for in the database. Techniques are disclosed for detecting exact matches and, to some degree, inexact matches between the search pattern and corresponding strings of characters in the database.
Although the Yu et al. approach is completely satisfactory for many data searching applications, there are some search functions that the Yu et al. apparatus cannot perform. For example, some searching applications cannot conveniently perform any search other than in a "retrospective" mode of operation, wherein the search pattern is stored in a pipeline of device cells and the database to be searched is streamed through to detect matches in the database. But some searching applications may require that a relatively small database be searched for the occurrence of any of a large number of requested search patterns, the results of the searches then being disseminated to the appropriate parties. The small database may be, for example, the most recent issue of a publication for which there are a large number of search requests. In this "dissemination" mode of data searching, the search patterns may together constitute a very large composite pattern; so large, in fact, that it may be impractical to fit it into the pipeline of search cells. One solution to this difficulty is to make multiple passes in the retrospective mode of operation, but this increases the cost and inconvenience of the search. Another approach is to place the database in the pipeline of cells, and to stream the composite search pattern through the cells. While this latter approach would seem to be the preferred way to handle the dissemination mode of operation, it is incompatible with searching devices of the retrospective type, such as the Yu et al. device. Ideally, it would be desirable for a data searching device to handle efficiently both the retrospective and dissemination searching modes.
Another difficulty in data searching is that a search technique is sometimes required to recognize data appearing in different forms. For example, textual data may include words that are either in hyphenated or non-hyphenated form. The word "airport" in a search pattern may not match with the hyphenated form "air-port" bridging two lines of text. Ignoring hyphens entirely is not a good solution, because in some cases the hyphen may be a legitimate component of the search pattern. This seemingly simple problem, and others of a similar nature, have plagued designers of textual data searching devices for some time.
Another limitation of textual database searching devices of the prior art is that the manner in which inexact matches are detected typically permits only a limited number of errors before a mismatch will be declared, and assigns the same weight to each type of error. In some applications, such as searching DNA and protein sequences, it is desirable to tolerate a relatively large number of errors without declaring a mismatch, and it is also desirable to be able to assign different costs to different types of errors.
Yet another inherent limitation of prior art systems is their inability to handle multiple-precision quantities. This ability has application in both numerical and textual searching. In textual searching the ability to handle multiple-precision characters is useful in searching some foreign-language textual material.
The present invention is directed to solutions to these problems, and also provides additional search capabilities not previously available in data searching systems of the prior art.