A data warehouse, in the broadest sense, is a database that contains large stores of current and historical data. In some cases, the data may be integrated from multiple data sources (e.g., marketing databases, sales databases, user databases, and other transactional databases used to maintain the most recent data). Typically, the data is organized and stored within the data warehouse. For example, in some cases, the data may be stored as a series of snapshots. In other cases, the data may be aggregated at a specific time interval (e.g., three months, six months, or older) and/or into specific subject areas.
When data is integrated from multiple data sources, the data warehouse can provide consistent codes, descriptions, fields, and flagging. For example, suppose that the multiple data sources have different identification mechanisms for a product, the data warehouse may provide a uniform identification mechanism for the product. The data stored in the data warehouse can also be analyzed with tools such as online analytical processing (OLAP) and data mining tools. The results from these analyzes can be used for a variety of business purposes such as generating various analytics and creating reports.
Over time, data warehouses can start running out of space. One solution is to split the data warehouse into smaller warehouses. These smaller warehouses could be geographically distributed. In some cases, each of the smaller data warehouse may be designed for a specific group of users (e.g., a team) or may host information regarding a particular subject. However, some groups of users or particular subjects may need access to the same data. Simply copying the same data to each of the smaller warehouses is inefficient. Another solution to the capacity problem is to delete data from the data warehouse. However, this data might be valuable in future data mining or analysis activities. As a result, more efficient techniques are needed for managing data within these data warehouses.