Enterprise wide data warehouses are becoming increasingly adopted as the main source and underlying infrastructure for business intelligence (BI) solutions. As a result, data warehouse frameworks being utilized must be configured to handle high data throughput.
With conventional data warehousing scenarios, well defined time windows are used to extract data from source systems and to store it in flat tables (e.g. DataStore objects, etc.) or in multi-dimensional data targets (e.g., InfoCubes, etc.). The following factors can influence the amount of time required to make data available for reporting: (i) time for propagating data to data targets (data loading time); and (ii) time needed for administrative tasks (e.g. dropping and re-creating indexes, activation of data in DataStore objects, rebuilding of aggregate data, etc.). The challenge, in particular for mass data, is to complete the steps within the defined time window.
Another aspect that can affect data availability is the degree of data coupling. In some implementations, data generated by differing source systems must be processed in a sequential manner (as opposed to concurrent processing). For example, if data “d1” generated by source system “s1” is closely coupled to data “d2” generated by source system “s2” the following restrictions might apply: (i) if “d1” and “d2” are generated in different time zones reporting (e.g. in a DataStore object, etc.) cannot be performed until both loading processes are finished. Otherwise the query result can include inconsistencies; and (ii) if “d2” is loaded into a data target (e.g. a DataStore object) after “d1” and the upload for “d1” failed, “d2” is also not available for reporting—despite the fact that “d1” has caused the uploading issue.