A data warehouse is a repository of an organization's electronically stored data, designed to facilitate reporting and analysis. It is also a nonvolatile data repository that houses large amounts of historical data. Data warehousing and associated processing mechanisms, such as Online Analytical Processing (OLAP), Relational OLAP (ROLAP), Multidimensional OLAP (MOLAP), and Hybrid OLAP (HOLAP), are common technologies used to support business decisions and data analysis. There are two leading approaches used to store data in a data warehouse—the dimensional approach and the normalized approach.
In a dimensional approach, transaction data are partitioned into either “facts”, which are generally numeric transaction data such as net sales, quantity sold, gross sales, etc, or “dimensions”, which are the reference information that gives context to the facts. In the normalized approach, the data in the data warehouse are stored following, to a degree, data normalization rules. Tables are grouped together by subject areas that reflect general data categories, such as data on customers, products, finance, etc. Dimensional approaches can involve normalizing data to a degree.
Dimensional data warehouses and data marts sometimes use tables logically arranged in a star schema or snowflake schema. The snowflake schema is represented by centralized fact tables that are connected to multiple dimension tables or dimensions are normalized into multiple related tables or sub-dimension tables. Each dimensional table represents a data dimension of the warehouse. All the data for a data dimension can be stored in the associated dimension table, or can be stored in one or more master data tables associated with the dimension table. A typical objective when using the snowflake schema is to remove low cardinality attributes from a dimension table and place these attributes in a secondary dimension table.
In a snowflake schema, the master data tables can all be directly connected to an associated dimension table, or can be arranged more elaborately with multiple levels of master data tables arranged connected in parent and child relationships, and where child tables may have multiple parent tables, thus resulting in a complex snowflake like arrangement. The term “Master Data” means the enterprise-spanning set of data arising or processed within all of the enterprise's primary business entities and functions. For example, in supply chain applications, master data includes product identifiers and details, suppliers, components, inventory, costs, and so on.
A Systems, Applications and Products in Data Processing (SAP), business intelligence warehouse, available from SAP America, Inc., is a data warehouse system that fully utilizes snowflake schema, enabling users to analyze data from operative SAP applications, as well as other business applications. In this context, OLAP technology enables multi-dimensional analysis from various business perspectives.
When a query, i.e., a request for reporting data, is sent to a dimensional data warehouse, the query being based on data stored in a master data table, the master data table is searched to obtain the master data table identifier for the data. The master data table identifier is often referred to as a surrogate identifier (SID). Once the surrogate identifier for the data has been obtained, data to fulfill the query can be obtained from tables in the data warehouse. This data may be located, for example, in a fact table, one or more dimension tables, and one or more master data tables.
When reporting data is requested from a data warehouse, the system searches master data first, then gets the dimension identifiers from dimension tables that are linked and then gets the data from a fact table. The dimension tables thus provide a platform to store numeric identifiers that link data between a fact table and master data tables. Due to the large volume of dimension tables, searching and obtaining appropriate data links between a fact table and a master data table can be a relatively time consuming process. Though the sheer volume can be very high, the proper organization of data can be helpful to ensure timely retrieval of valuable information.