Three problems must be solved when one is faced with a question that needs to be answered using data. First, one or more datasets must be identified as containing the data required to answer the question. Second, the relationships among the datasets and the data stored therein must be identified. Third, one or more queries must be formulated to answer the question. As the number of available dataset groups grows, the first two problems are becoming increasingly more complex compared to the third problem, yet most of the efforts in the storage systems area are focused on the formulation of queries.
Common approaches to the first two problems fall into two categories:
One approach requires the data to be well-organized into a well-understood semantic model prior to performing searches. Although powerful, such mechanisms have limited use due to the difficulty of organizing all of the data in advance of the searching.
In another approach, full text searches are used in the data sets. The datasets are treated as if they are typical documents and full-text search techniques are applied to the content. While this technique can easily handle any type of data, the inability to understand and utilize the structure of the data and the relationships within it makes it unlikely that complex questions can be answered by full text searches alone.