1. Field
This field generally relates to data source query generation.
2. Related Art
Various types of data sources exist, including relational databases, NoSQL databases, and big data sources. Data sources may organize data into tables, where each table may have rows and columns or they may appear to have rows and columns. Each row may represent a record. Each column may have entries for each record of the same data type. A data type may be, for example, a text string, number, or date.
Two example tables—a revenue table and a profit table—are reproduced below. In this example, each table has four columns, three of which are the same: date, product, and customer. Each table also has a column not shared by the other. In particular, for each record, the revenue table has a revenue entry, and the profit table has a profit entry.
Below is an example revenue table, which may be referred to below as Revenue_Fact:
DateProductCustomerRevenueJan. 1, 2014ChairJay10Jan. 1, 2014ChairAlex50Jan. 1, 2014DeskAlex100Jan. 25, 2014LampJay10Jan. 2, 2014DeskJane100Jan. 2, 2014LampRoger5Jan. 10, 2014ChairRoger25
Below is an example profit table, which may be referred to below as Profit_Fact:
DateProductCustomerProfitJan. 1, 2014ChairJay5Jan. 1, 2014ChairAlex40Jan. 1, 2014DeskAlex75Jan. 25, 2014LampJay5Jan. 2, 2014DeskJane75Jan. 2, 2014LampRoger1Jan. 10, 2014ChairRoger10
Data analysis tools may be used to retrieve data. To retrieve data, data analysis tools may filter data by entities, such as dimension entities and measure entities. Dimensions may refer to entities by which you want to group data. For example, in the above tables, Date, Customer, and Product may be dimension entities because those are fields by which data can be grouped. For example, to determine the total revenue for each product in the Revenue_Fact table, the records must be grouped on product. Because all the records in the table have one of three products (chair, lamp, and desk), the result may have three rows, each listing the total revenue for one of the three products.
Measure entities may refer to formulas calculated by applying an aggregation function to a column in a database table. An aggregation function examines multiple rows of a table, perhaps in a group, as an input to determine a value. For example, a measure entity to determine total revenue may have the formula sum(revenue). If the records are grouped by product, the result may be the total revenue provided by each product. In addition to sum, other example aggregation functions include, but are not limited to, average, count, maximum, minimum, median, and mode.
One widely adopted way to retrieve data from databases is to use structured query language (SQL). SQL can be quite sophisticated and complex. To help users manage the complexity, many data analysis tools automatically generate SQL. Returning to the example above, to determine the total revenue provided by each product, the following SQL query may be used:
Select                Product,        sum(revenue)        
From                Revenue_Fact        
Group By                Product        
SQL provides two primary ways of filtering out records: the WHERE clause and the HAVING clause. In the example above, a filter may be used to identify which products have a total profit exceeding 10. In SQL, the filter may be represented using a HAVING clause, as set out below:
Select                Product        
From                Profit_Fact        
Group By                Product        
Having                Sum(profit)>10        
For some questions, multiple queries may be needed of the database. These may be referred to as multi-pass queries. But often these multi-pass queries are inefficient or inaccurate. Methods and systems are needed to improve multi-pass query processing.