A pattern matching algorithm refers to an algorithm for detecting the presence of a desired character string pattern in specific text. Such pattern matching algorithms are classified into single pattern matching algorithms and multi-pattern matching algorithms according to the number of patterns required to be found.
The Boyer-Moore algorithm is a pattern matching algorithm that is hitherto most well known. The Boyer-Moore algorithm works relatively well when the number of patterns is one, but the performance thereof is rapidly degraded when the number of patterns becomes larger, so that this algorithm has a problem in that it is difficult to be used as a multi-pattern matching algorithm.
In order to overcome this problem, the Modified Wu-Manber (MWM) algorithm was proposed. The MWM algorithm is configured to include a preprocessing stage of creating SHIFT, HASH and PREFIX tables using a set of a plurality of patterns required to be found and a scanning stage of scanning text using the tables. Here, the SHIFT tables are used to define the number of character strings that can be skipped over in a text scanning process, and the HASH table and the PREFIX table are used to approximately determine a match with a pattern when the shift value of a relevant block is 0.
However, since the performance of the MWM algorithm is dependent on the length of the shortest of the patterns of the pattern set, this algorithm has a problem in that its performance is significantly degraded when a pattern having a short length is included in the pattern set.
In order to overcome the problem of the MWM algorithm, the L+1-MWM algorithm was proposed. The MWM algorithm always creates a SHIFT table, a HASH table and a PREFIX table using character strings having a length of LSP on the leftmost portion of each pattern, whereas the L+1-MWM algorithm creates tables using character strings having a length of LSP+1 on the assumption that a virtual 1 byte is present on the leftmost position of the shortest pattern.
However, the L+1-MWM algorithm also has a problem in that it does not produce an expected speed improvement effect compared to the MWM algorithm in the case where the types of character strings present in the front portions of patterns are various. Furthermore, all of the MWM and L+1-MWM algorithms have a fundamental limitation in that the MWM and L+1-MWM algorithms must have small average shift values compared to algorithms using single-byte character based SHIFT tables because the MWM and L+1-MWM algorithms are constructions using multi-byte character based SHIFT tables.