Information retrieval plays an increasingly prominent role both in academic and industrial scientific research but currently suffers from a lack of numeric search capability in general and a lack of numeric data extraction from unstructured data specifically. Since an estimated 95% of all of the information currently on the web is “unstructured,” sophisticated information extraction techniques are required to transform such content into usable data. A challenge in extracting numeric data from unstructured documents is that in addition to locating the keyword or numbers corresponding to the query parameters one must be able to intelligently contextualize that data in order to use it. Even with structured data, contextualization and visualization of results remains a challenge. The need for such an information retrieval system is supported by the fact that numeric data represents one of the most valuable subsets of information on the internet, including financial statistics and technological specifications.
Enhanced numeric search capability may be implemented at several levels. At a low level, existing search tools need to be augmented with more refined recognition of numeric notation and unit conversion. At a high level, certain systems aim to semantically identify, and make searchable, all concepts within unstructured documents. Given that this is a difficult, if not impossible, long-term goal, there exists an immediate need for systems of intermediate capability, which build in low-level broad numeric capabilities, but necessarily stop short of full contextualization. Further, many search queries do not have singular answers and are not suited to systems claiming to have achieved full recognition of meaning and context. Such queries are more properly answered with numeric distributions, with inherent width and transparent uncertainty. Thus there is a need for exploratory models of search particularly suited to the unique nature of numeric, as opposed to linguistic, information.