Data Warehouses are underpinned by sets of data, referred to as Named Materialized States, or NMSs.
Data Warehouse software moves (or copies, or transfers) data between the NMSs, typically transforming the source (or “loaded”) NMSs ultimately into a set of “target” NMSs for the end-users of the data warehouse to query. The data may pass through many intermediate NMSs en route from sources to targets.
Data Warehouse software may be written directly (“by hand”), or may be produced indirectly by the use of a software development tool (sometimes also known as a “workbench” or “integrated development environment”).
Each individual component of the software (referred to as a “module”) typically takes data from one or more NMSs and transfers it to its own, intermediate, target NMS (or NMSs) using Structured Query Language (SQL) or procedural logic (such as Oracle's PL/SQL). This is referred to as a “task”.
This dependency that data warehousing Tasks have on data means that warehouse software is vulnerable to unintended “side effects”, where a change to one warehouse software module for a specific purpose can have a dramatic and unexpected effect on the behaviour of the warehouse software in some other, seemingly unrelated module.
The tasks of the software must be executed in a clearly defined order to ensure that the “target” NMSs get populated with the correct data. This ordered set of tasks is referred to as a “schedule”.
Data warehouses typically contain large amounts of data. It can be very time consuming to execute the data warehouse software and process large volumes of data during meaningful testing. Similarly, simply running tests against large volumes of data can also be very time consuming.