The invention relates to managing an archive for approximate string matching.
Various techniques for approximate string matching (also called “fuzzy” or “inexact” string matching or searching) are used for finding strings that match a given pattern string within some tolerance according to a string metric (also called a “similarity function”). The strings being searched may be substrings of a larger string called a “text” or may be strings contained in records of a database, for example. One category of string metric is the “edit distance.” An example of an edit distance is the Levenshtein distance, which counts the minimum number of edit operations (insertion, deletion, or substitution of a character) needed to convert one string into another. Approximate string matching includes on-line matching, in which the text to be searched cannot be processed (or “indexed”) before the matching begins, and off-line matching, in which the text can be processed before the matching begins.