Search engines using a keyword approach are limited in their capability to look for misspellings in a search query. They look for search queries which are “close” in terms of number of letters to be edited to get to a search item with a higher frequency of search query occurrence for misspellings. This approach achieves excellent performance, but at the cost of missing some obvious misspellings or requiring user interaction to resolve.
Some other search engines use a N-gram based approach in which all fixed-length sequences of letters (grams) in the searchable items are indexed into a gram index. For example, in a “trigram” system the text of every search term is split into three letter tokens that are indexed separately, e.g., the term “madonna” when indexed results in the separate trigrams “mad”, “ado”, “don”, “onn”, and “nna” being stored in the gram index. The gram index is searched by splitting the search query to grams of the same length and searching for each of the grams in the gram index. While this approach is effective in catching misspellings, it comes at a significant performance cost, intensive storage requirements and often returns far too many possible results.