Different computerized systems use data in different ways. The way in which data is used informs how the data is stored and maintained. To illustrate this widely recognized principle, the domains of data warehousing, operational reporting, and data archiving, and data feeds will be briefly discussed.
A data warehouse is a database used for generating reports and data analysis. To facilitate reporting and data analysis functions, data is often transformed and organized in star schemas within a data warehouse. Populating the data within the data warehouse is done via ETL (Extract, Transform, Load) operations, which requires that the ETL system maintain, in addition to the current state of the data warehouse, information about the last incremental data extractions obtained from the source tables. ETL operations propagate incremental changes made at the source tables into the star schemas of the data warehouse. ETL operations may transform the data prior to loading the data into the data warehouse. Examples of such types of transformation include data cleansing, data standardization, surrogate key generation, surrogate key replacement, unit of measure conversion, and currency conversion. Business intelligence (BI) applications use data gathered from a data warehouse or a subset of the warehouse called a data mart.
Operational reporting refers to reporting about operational details of current activity. For operational reporting, queries need to be performed against current data, in contrast to analytical reporting where historical data suffices and there is no requirement to query the latest data in real time. Therefore, operational reporting queries are performed against live systems to ensure the data is the most currently available. In operational reporting, the freshness of the data and a quick response time are valued more than storing large amounts of data over a long period of time.
Data archiving is the process of moving data that is no longer actively used to a separate data storage device for long-term retention. While data archiving requires saving both the current version of data as well as any historical version of data, it does not require the data to be stored in any particular format, such as a star schema. Speed of access is not a primary concern in data archiving, as data retained on a long-term basis in the data archive will not be accessed frequently but in contrast to data protection and backup products, the data in archiving products needs to be searchable and queryable by end users and eDiscovery applications.
Data feed systems allow users to receive updated data from data sources as the data changes at the data source. Data feed systems can supply data in the same format as the data source or in different formats (ex. star schema) which provide value add over the source format. Historical data feeds will supply, in addition to the current state of data at the data source, historical state of the data at a previous point in time.
Given the sharp differences in how data warehousing, operational reporting, data archiving, and data feeds are used, each of these approaches in practice are performed using separate persistent data stores that are designed to support the requirements of its intended use.