In various data processing operations it may be useful to determine whether a set of data (referred to herein as the search space) including a plurality of characters contains a string matching a search pattern string. For example, a user may wish to search a document, or a plurality of documents, or a database, to determine whether it contains a particular word, or in how many places it contains the word. In some cases, it may be useful to search for occurrences of strings matching a regular expression, which may, for example, define a family of words. For example, the regular expression “[Hh]otel” would match both “Hotel” and “hotel”, and a user could use a search with this regular expression to count the number of times the word “hotel” occurs in a document, whether at the beginning of a sentence (in which case it would be written “Hotel”) or in the middle of a sentence (in which case it would be written “hotel”).
Performing searches for multiple characters may be computationally costly. For example, when a character is received from the search space, determining whether it is either “H” or “h” may involve, in a microprocessor, executing two compare operations, one after the other, or, in hardware, implementing two comparators (e.g., 8-bit comparators). If a larger number of possibilities are to be tested, the determination may become correspondingly more costly. For example, to determine whether a character is an upper case or lower case consonant may require 40 compare operations in a microprocessor, or an array of 40 comparators (if “y” is considered not to be a consonant). Some hardware implementations for comparing a string from a search space to a regular expression may use multiple single character comparators assisted by range comparators in order to find a positive match of a particular search pattern; these approaches too may be costly.
Thus, there is a need for an efficient way to perform pattern matching.