A data warehouse is a subject-oriented, integrated, nonvolatile, and time-variant data set, designed to facilitate decision-making and data analysis for companies and organizations.
Normal operations of production systems require support by data warehouses. As used herein, backflow of data refers to loading the data from a computing result table of a data warehouse into a corresponding table of a production system database. As production systems become more complex, databases of the production systems are becoming increasingly overloaded. To alleviate the burden on databases of production systems, in many existing databases of production systems, large tables that are originally placed in a database are divided according to certain rules into multiple small tables in multiple independent databases in multiple inexpensive hosting systems. This technique can lower the hardware requirement and load on the databases of the production systems. However, due to the one-to-multiple change in data storage mode of the databases of the production systems, the backflow of data from a data warehouse to a production system database has to be changed correspondingly. Originally, the backflow of data propagates from a table of the data warehousing system to a table of the production database. As the large table in the production system database is divided into multiple small tables, the backflow of data now propagates from a table of the data warehouse to multiple sub-tables of the production system.
For example, if the table of the data warehouse corresponds to a large number of sub-tables of the production system database (e.g., when the large table is divided into thousands of sub-tables), an existing system implements data backflow by creating a corresponding sub-table in the data warehouse for each of the sub-tables of the production system database and updating the sub-tables of the production system database with corresponding the sub-tables of the data warehouse. This technique results in significant increase of the number of tables in the data warehouse, which further leads to greatly increased number of tables to be maintained and the difficulty of maintenance. Moreover, the process of distributing data of a single table to multiple sub-tables in a data warehouse is complicated, causing increased computing and backflow time and resulting in a bottleneck of backflow. In particular, if the prolonged backflow time occurs at a peak load hour of the production system database, the production system may be degraded in reliability.