1. Field
Example aspects of the present invention generally relate to text-based searching, and more particularly to a text-based fuzzy search.
2. Related Art
Text-based searching is commonly used to match input text to documents or data which contain similar text. For example, a search engine might match a user's input text with websites containing such text. In order to increase the number of search results and to account for text anomalies such as typographical errors, some text-based searching methods are “fuzzy”, in that the search returns results which may not exactly match the input text, but contain text similar to the input text.
Existing fuzzy search methodologies roughly divide into (1) phoneme-based approaches and (2) multi-gram based approaches, both of which have drawbacks. Phoneme-based approaches are not language-agnostic, and therefore are confined to a particular language or set of languages for which phonetic knowledge is available. Meanwhile, multi-grams produce very space-consuming indices, and therefore tend to either not be very robust (in the case of long multi-grams) or link to too many records (short multi-grams).