1. Field of the Invention
The present invention relates to searching documents using numbers.
2. Description of the Related Art
A large fraction of data on the Web is numeric, yet current search techniques are actually quite primitive at searching for numbers. Essentially, if a person desires information on a numeric query, the person must enter exactly the number desired, because all current search engines do is treat numbers as character strings to find exact matches.
To illustrate the problem, consider the number 6798.32, which, when input to current search engines, correctly returns pages relating to the lunar nutation cycle, but when input as 6798.320 produces no page at all in response to the numeric query. As another example, consider a person who wants to find a specification sheet on a particular semiconductor device that has a set-up speed of 18 nanoseconds at a power rating of 495 mW. If the exact numeric values are provided many search engines can find matching character strings and thereby return a relevant page, but if the user can only supply approximate values, e.g., a query of “20 nanoseconds, 500 mW”, current search engines are unable to return relevant pages. Unfortunately, it is often the case that a person knows only approximate values for numeric information he or she seeks, and thus is seldom helped by current search engines.
Roussopoulos et al., “Nearest Neighbor Queries”, Proc. of the 1995 ACM SIGMOD Int'l Conf. on Management of Data, pp. 71–79 (1995) propose storing attribute-value pairs in a database so that queries can be executed against them and answered using nearest-neighbor techniques. Unfortunately, as recognized herein such solutions do not adequately address the problem of searching with numbers, because very often different documents refer to the same attribute by different names, making it difficult at best to establish correspondences between attribute names and values. Indeed, the major content companies in the electronics industry employ a host of people to manually extract such parametric information. The present invention recognizes, however, that it is not necessary to establish exact correspondences between attribute names and numbers, or indeed to specify attribute names at all, but only numeric queries that approximate desired values, and still produce meaningful query results.