The present invention relates to the field of data analysis and, more particularly, to a search tool for finding and ranking datasets having meta-data (e.g., scientific data) associated using user-entered parameters (e.g., numerical ranges of the metadata).
The scientific community has continually generated large volumes of data over the years. Advances in data collection devices (i.e., deployed sensors that transmit data to a central point) have streamlined and automated tasks that once required manual attention, increasing the rate at which data is collected and analyzed. For example, the Center for Coastal Margin Observation and Prediction (CMOP) has accumulated terabytes of data from various fixed and mobile deployed sensors.
While this expansive collection of data provides researchers with a wealth of information, it has become increasingly time-consuming and difficult to find data relevant to a scientist's research problem. One example is data close to a specified time and location, which is important when assessing the impact of one's findings in a broader or narrower context. For example, a microbiologist may look for data near the Astoria Bridge in June of 2009 in order to put a collected water sample from that location into physical context. In another example, the microbiologist may look for a variable such as nitrogen within a specific range of values.
Locating and scanning each potentially relevant dataset (i.e., collection of related data points) not only requires time, but an understanding of each dataset's storage location, access methods, and format as well. Often, the researcher is unaware of or unable to identify relevant datasets. For example, datasets from a sensor that is geospatially-fixed (i.e., stationary at a known location) at the Astoria Bridge (or location of interest) must still be searched for the appropriate time interval. Datasets collected by mobile sensors require additional time to correlate the position of the sensor with respect to the Astoria Bridge and determine if the distance at which the dataset was collected is acceptable, before the dataset is examined for the time interval.
While many tools exist to analyze and/or visual data, these tools must be told a dataset and data ranges to analyze/visualize. While such tools allow the researcher to find needles in a haystack, the researcher is still left with the problem of which haystacks are most likely to contain the needles they want. That is, existing tools do not address the problem of assisting the researcher in discovering datasets that have the potential to be relevant to a specified time and/or place (i.e., datasets that are “close” to the query in time and/or place).
More specifically, existing tools are largely based on text matches. That is, content is indexed for keywords, and these keywords are matched to the indexed information. These types of matches do not translate well for scientific data, where a searcher is often most interested in results within a bound numeric range based on the gathered scientific data (or within a bound subset of a larger mathematically expressible set).