Numerous computer science problems involve searching strings of characters to find two matching or partially matching strings. Many types of data involve strings, including addresses, names, file paths, Uniform Resource Locators (URLs), and so forth. Strings can be stored in a variety of ways in a computer system, such as one or more characters terminated by a predetermined identifier (e.g., null-terminated strings) or counted data structures that store a string length and array of characters. String matching may include comparing a single source string to a single target string, comparing a single source string to a set of multiple target strings, comparing multiple source strings to multiple target strings, and so forth.
In many software applications, string matching consumes a significant quantity of the computer hardware resources (e.g., processor time or memory space). Modern desktop search programs often spend a large percentage of their execution time comparing a search query string with many possible matches in a search index. Thus, performance of applications can be noticeably affected by the algorithms and data structures selected by application developers to store and manipulate strings.
Standard techniques for matching a source string against a large set of target pattern strings are inefficient and expensive. For example, many techniques iterate through each potential target string, comparing characters until a mismatch is found before moving to the next potential target string. This technique increases in time for every new target string added to the set, and slows as the source string length gets longer.