Data in a data warehouse is not stagnant. Database administrators (DBAs) start out with terabytes of data from many sources. New data comes in every day. The new data may be of differing data types and may come from the same source or from new sources. Over time, new associations in the data become clear while old associations become obsolete. Existing data needs to be recast into larger containers or different type containers or with a different ordering. The recasting reflects the data's current usage, which changes over time. The database schema needs to change over time to meet users' needs for high-performance, on-demand data; the term for this is schema evolution.
Schema Evolution may require transforming every data row, changing column ordering, data types, container sizes, clustering keys, physical partitioning, and/or row formatting, all while maintaining data consistency, integrity and availability. Metadata (information about the data including how it is defined, organized, and stored, as well as information about relationships with other objects in the database) also must be changed, as must information replicated to improve performance such as indexes and materialized views. This schema evolution needs to be done while the data warehouse is also being queried and updated.
Current database application programs offer some functionality that allows the database structures to change over time. The current functionality is restrictive on the types of changes allowed, may require multiple operations or commands, and may need to take the data off-line to implement the changes to the database structures. There may be untransformed data after the schema evolution operation completes which has query performance impact.