In data warehouse growth scenarios involving the addition of new database partitions to an existing database, it is necessary to redistribute the data in the database to achieve an even balance of data among all the database partitions (i.e., both existing partitions and newly added partitions). Such redistribution scenarios typically involve movement of a significant amount of data, and require down time while the system is off line.
Relational tables are typically distributed over one or more of the database partitions, and each partition resides on one physical machine in a cluster of physical machines. The location of rows of data in a given database table partitioned in this way is determined by a distribution function that maps row data to a database partition number. In such a database system, it may occasionally be desirable, or even necessary, to modify this distribution function. One common reason for doing so is that the current database manager capacity is inconsistent with current or future business requirements and so physical machines need to be added or removed from the database cluster. Another common reason is that the existing distribution of data across database partitions becomes non-uniform or inconsistent with the processing power of the physical machines on which database partitions reside. Whenever the data distribution function is modified, it is necessary to redistribute existing table data among the database partitions according to the new distribution function. Performance and usability are critical aspects of any data redistribution operation since prolonged down time of the database is typically not acceptable.
When database manager capacity does not meet present or projected future needs, a business can expand its capacity by adding more physical machines. Adding more physical machines can increase both data-storage space and processing power by adding separate single-processor or multiple-processor physical machines. The memory and storage system resources on each machine are typically not shared with the other machines. Although adding machines might result in communication and task-coordination issues, this choice provides the advantage of balancing data and user access across more than one system in shared-nothing architecture. When new machines are added to a shared-nothing architecture, existing data needs to be redistributed. This operation is called data redistribution. This data redistribution operation is far more common in data warehouse customers as the amount of data accumulates over time. In addition, as business mergers and acquisitions become more popular, the need for more capacity also increases.
One problem with redistribution operations is that when an interrupted redistribution operation is continued, each receiving partition needs to perform a process of physically “undoing” the redistribution work that was partially completed at the time of the failure. This can be time consuming and may require a substantial amount of system resources, which can impact performance.
Accordingly, what is needed is an improved method and system for improving undo operations after an interruption of a data redistribution process. The present invention addresses such a need.