A data warehouse is a repository of a company's historical data, and contains the data for decision support systems. Data warehouses are designed to allow a data analyst to perform complex queries and analysis on the information without slowing down the day-to-day operational systems. In other words, data warehouses are optimized for reporting and analysis, which is why they are often referred to as online analytical processing (OLAP) databases. In order to achieve this end goal of complex analysis of historical data with reasonable response times, data in data warehouses is typically stored in a dimension-based model. Data is typically organized so that data elements that relate to the same object in the real world are linked together. Data warehouses typically follow a de-normalized approach to structuring data, where the fast access to summary information is more important than going to great extents to avoid duplication of data across multiple tables. Since data warehouses are typically read-only for historical analysis, there is usually less concern about data duplication, and the same information may be contained in multiple tables where that information might be relevant.
On the other hand, online transaction processing (OLTP) databases are typically used to handle the day-to-day operations of a business, where data is being regularly added and modified. OLTP systems are typically designed for speed of modification so that the response time is good for processing each transaction. In OLTP databases, the database structure typically follows a normalized approach, which follows certain rules for avoiding duplication of data and other general design practices that help ensure optimum performance for data that is updated regularly. For example, in an OLTP database, the customer name and address may only be present in a customer detail table to avoid duplication of data, and to ensure that only one copy of that record is maintained. That same information may be contained in multiple tables in a data warehouse. In another example, one table in a data warehouse may provide sales details, with the customer name and address duplicated for each sales record. That data warehouse may also have a customer table that contains the name and address details. While this duplication of data would violate good design rules for an OLTP database, it can be appropriate practice for a data warehouse since speed of answering specific business questions is the primary concern, and the data is not being updated (which would be very difficult to achieve when the same data is contained in multiple places).
In order to generate a data warehouse, a transformation process is typically used to transform the data in the OLTP database into the format used by the data warehouse. This transformation process is often referred to as an extract, transform, and load (ETL) process that involves extracting data from outside sources, transforming it to fit business needs, and ultimately loading it into the end target, i.e. the data warehouse. This ETL transformation process typically involves the transformation of the normalized database tables in the OLTP database into data structures with various dimensions in the data warehouse. Since the structure of an OLTP database is very different from the structure of a data warehouse, the results of the transformation from OLTP data tables to data warehouse tables should be analyzed to ensure the desired end result was actually achieved. In the very simple example introduced earlier, verification would be needed, among other things, to ensure that the customer detail information was propagated correctly from just the single table in the relational database to the multiple tables in the data warehouse. This verification process can become very tedious and prone to errors, especially for data warehouses that have any sort of complexity, which most of them do.