Different computerized systems use data in different ways. The way in which data is used informs how the data is stored and maintained. To illustrate this widely recognized principle, the domains of data warehousing, operational reporting, and data archiving, and data feeds will be briefly discussed.
A data warehouse is a database used for generating reports and data analysis. To facilitate reporting and data analysis functions, data is often transformed and organized in star schemas within a data warehouse. Populating the data within the data warehouse is done via ETL (Extract, Transform, Load) operations, which requires that the ETL system maintain, in addition to the current state of the data warehouse, information about the last incremental data extractions obtained from the source tables. ETL operations propagate incremental changes made at the source tables into the star schemas of the data warehouse. ETL operations may transform the data prior to loading the data into the data warehouse. Examples of such types of transformation include data cleansing, data standardization, surrogate key generation, surrogate key replacement, unit of measure conversion, and currency conversion. Business intelligence (BI) applications use data gathered from a data warehouse or a subset of the warehouse called a data mart.
Operational reporting refers to reporting about operational details of current activity. For operational reporting, queries need to be performed against current data, in contrast to analytical reporting where historical data suffices and there is no requirement to query the latest data in real time. Therefore, operational reporting queries are performed against live systems to ensure the data is the most currently available. In operational reporting, the freshness of the data and a quick response time are valued more than storing large amounts of data over a long period of time.
Data archiving is the process of moving data that is no longer actively used to a separate data storage device for long-term retention. While data archiving requires saving both the current version of data as well as any historical version of data, it does not require the data to be stored in any particular format, such as a star schema. Speed of access is not a primary concern in data archiving, as data retained on a long-term basis in the data archive will not be accessed frequently but in contrast to data protection and backup products, the data in archiving products needs to be searchable and queryable by end users and eDiscovery applications.
Data feed systems allow users to receive updated data from data sources as the data changes at the data source. Data feed systems can supply data in the same format as the data source or in different formats (ex. star schema) which provide value add over the source format. Historical data feeds will supply, in addition to the current state of data at the data source, historical state of the data at a previous point in time.
Users of these types of computerized systems will often have different preferences for what types of data the systems should track and how the system should work. Not surprisingly, these types of computerized systems are often highly customized upon installation. For example, different companies using a data warehousing application may wish to customize portions of the applications, such as the default dashboards and reports, to better reflect their needs.
For a particular company to make such customizations to the features of the default version of the application, the dimensional model used by the application to storing data may need to be changed to support the customizations. The customizations made to the dimensional model used by the data warehouse application for that company will, in turn, require the company to make customizations to the ETL pipeline to populate data in the customized warehouse dimensional model. It is also possible that companies may have customized one or more data sources from the standard image and may desire that those customizations be reflected in the dashboards and reports provided by the data warehouse application, which would again require the source image for the data warehouse application to be customized along with the ETL pipeline, the dimensional model, and the reports and dashboards. Once a company has a customized data management system, they would also expect that the customizations be preserved across release updates of the underlying software.
As all software evolves over time, release updates are unavoidable and they are generally performed on the base image of the previous release of the software. As the company's customized image could be different in many ways than the base image, preserving the company's customizations while upgrading them to the new release is a challenge.