A distributed database is a database in which storage devices are not all attached to a common processing unit, such as a central processing unit. Instead, multiple computers are used to implement a distributed database management system. The multiple computers may be located in the same physical location, or they may be dispersed over a network of disaggregated interconnected computers. There is typically a master node and a set of slave or worker nodes that store partitions of the distributed database. Each partition may include a set of source files
An analytical view is a subset of data from a table or multiple tables. The analytical view may be computed by applying joins, unions, applying filters or other Structured Query Language (SQL) operations to the table or tables. The analytical view typically comprises dimensions and aggregates, although either dimensions or aggregates may be absent. The analytical view may comprise a dimension (e.g., a column name) and an aggregate (e.g., sum, min, max, median, mean) that is defined prior to the receipt of a query and is maintained as a data unit separate from the table. An attribute can be a dimension or an aggregate. When data is grouped along an attribute, it becomes a dimension. When data is aggregated on an attribute, it becomes an aggregate. For example, in the case of the request for ‘sum(amt) by product_id’, product_id and amt are both attributes in the table. Product_id is used as a dimension and amt is used as an aggregate. The analytical view exposes a dimension ‘product_id’ and an aggregate sum(amt)′.
Database systems use analytical views to expedite query processing. Analytical views typically materialize (e.g., cache) data resulting from computations frequently needed by queries. When a database system can prove that, semantically, it is correct to answer the query using the data in an analytical view, the system uses the pre-aggregated data from the analytical view to save processor and input/output bandwidth. This results in expedited processing of the query. However, as new data is loaded into the table, the analytical views can become stale and queries serviced from them would be incorrect. Some analytical views can be refreshed incrementally, but not all analytical views are incrementally refreshable. A full re-computation of all analytical views can be an expensive operation so it is desirable to leverage analytical views that are not stale.
In some scenarios, data is loaded into base tables continuously or at a high frequency (e.g., every hour) using an Extract, Transform and Load process. This results in analytical view data getting stale frequently and not being usable to answer queries.
Accordingly, there is a need to establish techniques to process queries utilizing both analytical views and a database content.