The present invention relates generally to data warehousing data management and more specifically to optimizing single-row operations within a data warehouse.
Data warehouses are central repositories of integrated data from a plurality of disparate sources. Data warehouses store current and historical data and are used for creating analytical reports for users throughout an enterprise. Data in data warehouse systems are stored in multiple physical locations called extends. Data warehouse queries typically read and process a large amount of data (known in the art as massive data) and those operations search through an entire set of data (extends) in order to output a final response. The sequence in which extends are read by the data warehouse are inconsequential as all data is read.
When single-row operations (requiring only one row to be found) are periodically processed, records are read and operated on in small group of extends. Single-row operations are processed for usage and/or customer specific needs and related typically to periodical data verification, housekeeping and/or audit of records created and/or updated in a specific time period. In the case of processing single-row operations, the sequence in which extends are read by the data warehouse becomes a factor to performance of data retrieval.