Approximate string matching is an operation often needed in scenarios such as the generation of spelling suggestions for misspelt words, approximate search in large databases in natural languages, or approximate search using other characters forming recognised patterns.
Approximate string matching with compound word handling is often needed where words or any sort of data which can be naturally broken up into components are matched. Breaks between words or components may be missing in an input pattern requiring compound word or component support to match the input pattern to recognised words or components.
US patent applications Nos. US 2005/091030 and US 2006/004744 describe methods of approximate string matching that can handle compound words. US 2005/091030 relies on a combination of large dictionaries of widely used compound words and semi-approximate search covering only certain types of errors.
US 2006/004744 includes a trie-based dictionary with gloss nodes for word fragments as well as for complete words. The method includes looping the trie walker back to the root node if it reaches the gloss node of a word fragment and the current gathered suggestion is shorter than the target string. This forces the trie walker to accept word fragments along with stand-alone words. US 2006/004744 gives a complete treatment of compound words, but its efficiency is not optimal due to repeated look-ups for right hand side word part matches.
In US 2006/004744 a method is also described in which approximate string matching in a trie-based dictionary includes correction rules in the trie data structure.