The present invention relates generally to the field of query optimization, and more particularly to cardinality estimation of join between a fact table and an overloaded dimension table.
In relational database queries, cardinality estimation is the problem of estimating the number of tuples returned by a query. Cardinality estimation is a fundamentally important task in data management, used in query optimization, progress estimation, and resource provisioning.
In computing, a star schema, also referred to as snowflake schema, is a style of data mart schema and is a commonly used schema for a data warehousing environment. The star schema consists of one or more fact tables referencing any number of dimension tables. The star schema is an important special case of the snowflake schema, and is more effective for handling simpler queries.
In data warehousing, a fact table includes measurements, metrics, or facts of a business process. The fact table is located at the center of a star schema or a snowflake schema surrounded by dimension tables. Where multiple fact tables are used, these are arranged as a fact constellation schema. A fact table typically has two types of columns: those that contain facts and those that are a foreign key to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. Fact tables contain the content of the data warehouse and store different types of measures that fall into three categories: additive, non-additive, and semi additive measures. Additive measures are the most flexible and useful facts that can be summed across any of the dimensions associated with a fact table. An example of fully additive measures are sales, or purchases, from a store where hourly sales can be added to determine the sales for a day, a week, a month, a quarter, or a year. Semi-additive measures can be summed across some dimensions, but not all. For example, checking account balance or savings account balance are two common semi-additive facts. A savings account balance can be determined using a transactions file. However, balancing amounts from October, November, and December across the time dimension may not be practical. An example of non-additive measures are ratios. An approach for non-additive facts is, where possible, to store the fully additive components of the non-additive measure and sum fully additive components into the final answer set, then calculate the final non-additive fact.
In data warehousing, a dimension table is one of the set of companion tables to a fact table. Dimension tables usually have a relatively small number of records compared to fact tables, but each record may have a very large number of attributes to describe the fact data. Contrary to fact tables, dimension tables contain descriptive attributes (or fields) that are typically textual fields (or discrete numbers that behave like text).