It is becoming increasingly more resource intensive to produce useful results from the growing amount of data generated by individuals and organizations. Business organizations in particular can generate petabytes of data and could benefit greatly from mining such data to extract useful insights from their generated data that is automatically gathered and stored in the course of usual business operations.
A typical approach in attempting to gain insight from data includes querying a database storing the data to get a specific result. For example, a user may generate a query (e.g., an SQL query) and the query is sent to a database management system (DBMS) that executes the query on one or more tables stored on the database. This is a relatively simple case; however, with organizations relying on a multitude of vendors for managing their data, each with their own technology for storing data, retrieving useful insights from data is becoming ever increasingly complex. It is also not uncommon for queries to take several minutes, or even hours, to complete when applied to vast amounts of stored data.
The advantages to speeding up the process of data mining are clear, and some solutions attempt to accelerate access to the databases. For example, one solution includes indexing data stored in databases. Another solution includes caching results of frequent queries. Yet another solution includes selectively retrieving results from the database, so that the query could be served immediately.
However, while these database optimization and acceleration solutions are useful in analyzing databases of a certain size or known data sets, they can fall short of providing useful information when applied to large databases and unknown data sets, which may include data that an indexing or caching algorithm has not been programmed to process.
It would therefore be advantageous to provide a solution that would overcome the challenges noted above.