Electronic information retrieval is at the heart of modern society. Almost every household in the developed countries has some device to access electronic databases, such as a personal digital assistant, mobile phone or computer that can be connected to the Internet. A direct result of this connectivity is that electronic database information retrieval, and in particular Internet information retrieval has become a multi-billion dollar industry, not in the least because it is far from trivial to find the desired information in the sheer volumes of available data. This has led to the development of highly sophisticated search algorithms in which for instance advanced parsing techniques are used to quickly retrieve data from database records, e.g. websites, which have a recognized structure.
A specific class of data base queries is a query in which an answer to a question is sought. An example of such a query is “What is the capital of Australia?” Such queries may be for instance be answered using the well-known weighted majority vote technique in which the most common term associated with the search terms of the queries in the collected documents is presented as the answer (Canberra). Such queries typically utilize a redundancy property of the correct answer to a question-based query typically appearing many times on the Internet, contrary to incorrect answers, which will be much rarer.
In case a numerical answer such as a product price is sought, the search algorithm may run a search query on a subset of the Internet, in which product definitions and retail prices are available in recognizable fields of a categorized webpage, wherein the search algorithm parser utilizes the fact that the data of interest resides in such fields, such that a range of prices may be presented to the user. For instance, the Google shopping search engine allows users to define a query using search terms to describe a product and a price interval of interest.
However, it will be apparent that not all queries can utilize such formatting, for instance because the information of interest typically does not reside in categorized form. Also, some queries in which a numerical answer is sought cannot utilize weighted majority voting techniques because these techniques tend to fail, especially when the correct answer cannot be expressed as a single value, in which case the aforementioned redundancy occurs to a much lesser extent or does not occur at all.
An example of such a query is for instance “What is the lifetime of the Canon NB-5L battery in a Powershot SD1100 camera?” Such information may reside in on-line user forums and review websites, in which the sought answer is typically present in unformatted form, i.e. not available in a dedicated field, and may be present as an interval rather than a single value; “when used with flash I found the Canon NB-5L battery to last 2-3 hours in my Powershot SD1100”.
Also, for certain queries, it may be unclear what the correct answer is, e.g. “What is the age of the universe?” As there are many different ways of trying the estimate this age, a wide range of different answers may be obtained, from which it may be difficult to extract the correct answer due to the lack of redundancy in the answer set. In addition, it may be that keywords in a document that match the search terms of the query are associated with a numerical value that is unrelated to the question subject. More generally speaking, the current search methods do satisfactorily handle a query aiming to return a numerical value that is likely to contain a certain degree of uncertainty.