The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Analytics applications generally access large datasets to perform analytic operations. When a user wishes to perform an operation on a dataset, the user identifies where the dataset is stored and the analytics application sends a query to the server storing the dataset. The server computer system storing the dataset executes the query against the dataset and returns the requested information to the analytics application.
Depending on the type of query, executing the query against the dataset can be extremely inefficient. For example, if a user's query requests information on only a small subset of rows of a database, executing the query directly against the database requires the server computer system to check each row to determine if the row satisfies the query. Additionally, if the database is subject to row-based access controls, the data a user is allowed to access may be incredibly sparse, thereby causing the database to perform the query and then remove the rows that the user is not allowed to access.
In order to increase the efficiency of the system, a server computer system may use an index of the database. When a query contains a filtering condition or row-based access controls, the database can identify the requested rows through the index. The server computer system may then use the row identifiers to access the rows stored in the database. While the use of an index is more efficient than directly searching the database for each query, performing a query against the index followed by a query against the database may be inefficient. That inefficiency is increased if the database is stored using one application, like APACHE PARQUET, while the index is created and stored by another application, like APACHE LUCENE.
One solution is to use the index to recreate rows of the database when the index is searched. In some scenarios, recreating the rows from the index may be faster than searching the database and only returning a small number of rows. In other situations, such as when all values in a single column are being used, using the index to recreate rows in the database is less effective.
Generally, the user is responsible for identifying the target of a search query. This means that a user must know where the dataset is stored before an analysis request can be sent to the server computer system. Additionally, the user has no way of indicating to the server computer system that the server computer should use the index, not use the index, and/or rebuild rows from the index.
Thus, there is a need for a system that dynamically selects a backing store for responding to a query based on a semantic analysis of the query.