The present invention relates to querying data in a database and more specifically to a method of generating data in a table for comparison between data entries.
When performing queries on a database, especially in the field of trend analysis, data from multiple tables is frequently joined together in what is referred to as an outer join. There are three types of outer join: a left outer join, a right outer join and a full outer join.
A left outerjoin between two relations is an outerjoin where only the left relation is listed in its entirety. The right relation is listed only where it matches the left in the joined fields. For example, the left join of the relations shown in FIG. 1A is shown in FIG. 1B. There is no record for Jim Black.
A right outer join between two relations is an outer join where only the right relation is listed in its entirety. The left relation is listed only where it matches the right relation in the joined fields. The right join of the relations shown in FIG. 1A is shown in FIG. 1C. Jane Yates does not appear in this table.
A full outer join between two relations is a union of a left outer join and a right outer join, and includes all the tuples (records in a relation table) from both relations regardless of whether both relations have data in the dimension being joined. The full outer join of the two relations shown in FIG. 1A is shown in FIG. 1D. All the names are present in this table.
A cross product of two or more sets of data provides a record for every combination of all the records in each of the sets of data. The number of records in a cross product is therefore the product of the number of records in each data set, and can get very large.
Data is stored in a database in a set of dimension tables and fact tables. Dimension tables contain records representing a set of entities with similar characteristics. Each record has one or more values representing different attributes of the entity in question. One of these attributes, or a combination of the attributes, uniquely identify each record in the dimension, and is referred to as the key.
Fact data tables pro-vide all the actual data which has been accumulated. One or more of the columns in a fact table will be dimension columns, and each such column will usually represent a single dimension. Entries in a dimension column (or columns) associated with a specific dimension will represent the key value in that dimension the fact entry is associated with. The fact tables also contain columns with the actual data associated with the fact entry. For example, in the fact data table of FIG. 2, the three dimensions are year, quarter and product. The single measure in this table is Sales.
Frequently, it is required to compare data for entries in one or more dimensions, against equivalent data in other dimensions. For example, it might be desired to compare two sets of sales records stored in a database over two or more periods, although the data for the two periods could be stored in any number of fact tables which will need to be combined. FIG. 2 shows an example of such a set of data. The two dimensions on which the join will take place are months and products. It is often impractical to store a record in a fact table for every combination of each dimension, regardless of whether or not there is corresponding data for such entries, as this would take up a huge amount of space. Accordingly, databases usually only store fact data for dimension entries where data actually exists. In this example, entries are only stored for product sales if any of the product were actually sold in the respective month. Data entries therefore do not exist in the fact data if a particular product was not yet on sale during a certain period, or if it had been discontinued.
If the two sets of time periods to be compared are split into separate tables and a fall outer join is performed on the two sets of tables matching on the product and time dimensions, the joined table will contain entries for all date and product records in either of the tables, as shown in FIG. 3. A product will only be represented in the output table for months in which the product was sold in one of the two years. However, it is often desirable to have an entry for each product for every month so that data can be more easily compared.
The same can be true of any dimension, not just a period dimension. Often, when an entry exists for one entity in a dimension in a fact table, corresponding entries for other entities in the dimension might be desirable for comparison, regardless of whether there is corresponding data for those entities.
Furthermore, it is often desirable to analyze data on a different computer on a network from the computer on which the data is stored, and therefore it is critical that a minimal quantity of data is transmitted from the database to create the table. A method of performing a join is needed such that minimal amounts of data are transmitted across the network.
The present invention provides a method of forcing equivalent empty entries for all requested entities in certain preselected dimensions of fact data into an expanded fact table if there is a data entry for one of those selected entities so that the data entries can be easily compared across all the selected entities.
The present invention further provides a method of extracting certain data from predetermined records