String search in a text or a stream of characters is a field of growing importance in applications such as knowledge acquisition and deep packet inspection. A deep-packet inspection system examines a text of characters or a continuous stream of characters to detect the presence of specific string patterns belonging to a predefined list of string patterns. As occurrence of string patterns in character streams grows, the search effort increases resulting in reducing the throughput of the system in terms of the number of characters that can be examined per unit time.
The list of string patterns may include simple strings, complex strings, or a mixture of simple strings and complex strings. Fast search techniques for simple strings are well known in the art. In particular a search method known as the Aho-Corasick method is determined to be computationally efficient but is limited to simple strings. A computationally-efficient method for detecting and locating occurrence in a data file or a data stream of complex strings is disclosed in U.S. patent application Ser. No. 11/678,587 (Boyce), the specification of which is incorporated herein by reference. In some applications, a pattern may be of interest only if it bears some logical or positional relationship to other patterns in the same list of string patterns. For example, specific strings patterns found anywhere in a phrase may be relevant only if the phrase is preceded and/or succeeded by certain punctuation marks. The absence of such punctuation marks in a part of a text under consideration may render the search for the specific string patterns unnecessary.
There is a need, therefore, to explore string-search methods and apparatus which take into account relevance of combinations of string patterns in a text according to known interrelationships among the string patterns and, advantageously explore potential search-effort reduction that may result from such interrelationships.