A large amount of the useful web comprises documents that consist largely of numeric values embedded in the text. Searching for documents containing specific numeric values has only limited support. Most existing Web search engines treat numbers as strings, ignoring their numeric value. Thus, they only support exact matches for numeric terms. For example, while searching for “32.5”, such systems will retrieve only documents containing the exact string “32.5”, ignoring documents containing the same value in a different representation (e.g. 32.50, 32.500, 3.25e+1), or close values to the query term (e.g. 32.49, 32.51).
Treating numbers as strings has many advantages since numbers and keywords can be represented the same and measuring textual similarity between textual records is a well known problem and has many solutions. However, such methods do not provide support for numerical search.
Search for numeric values calls for a specific solution. For such queries, the user is interested in documents containing values that match exactly the numbers given as query, but may also be interested in values that are close to the numeric query in such a way that documents with closer values are ranked higher.
Some search system support “range search” (see for example Google Numeric range search, Google is a trade mark of Google, Inc.) that allows retrieval of documents containing numeric values in a specific range. Thus, searching for the range [32 . . . 33] will retrieve documents containing numeric values that are “close” to 32.5. However, there is no search solution that ranks the matching documents according to their distance from the query numerical values.
Another approach is based on “K nearest neighbor (K-NN) search” which is able to find the closest K numeric values to the query numbers. Another similar approach looks for “perfect merge” solution between the query values to the document values. These solutions are focused on matching numbers, ignoring textual similarity. Hence, hybrid queries containing both text and numbers require a special treatment.