This invention relates generally to data messaging retrieval and storage in a data processing system. More particularly, it relates to determining a set of data structures from which data may be distributed between nodes in a parallel database system.
Databases have become the subject of significant recent interest, not only because of the increasing volume of data being stored and retrieved by computerized databases but also by virtue of the data relationships which can be established during the storage and retrieval processes.
In the last decade, database system developers have turned their attention toward parallel processing platforms, because a parallel processing system's cost/performance ratio is often superior to that of conventional mainframes. Set-oriented database systems, like relational database systems, are particularly well-suited to parallel processing since the database can be spread across the multiple computers or "nodes" in the system, and requests against the database can thus be executed in parallel. A generic parallel database system is characterized by a cluster of powerful, inexpensive microprocessor-based computers, each of which includes one of more disk storage devices with high performance and capacity. The nodes are interconnected using a shared communication medium. The cluster uses standard "off the shelf" microprocessor and workstation hardware products to take advantage of the high performance, lower cost, and higher reliability found in commodity components. When the database size or workload grows near the capacity of the system, more nodes can be added to extend that capacity.
In such a system, the database is distributed across the nodes; each node stores a fraction of the database. Likewise, the workload is distributed across the nodes: requests are sent to the nodes that contain the desired data and are executed there. Consequently, data placement determines how well the workload is balanced across the nodes, and how well the system performs as a whole. In many cases, the best performance can be obtained by spreading the workload as evenly as possible across all of the nodes. However, in an initially balanced system, the type and frequency of requests will change over time, data will be added to and deleted from the database over time, causing the workload to shift over time. Eventually, the system will become imbalanced across the nodes. Thus, the data will occasionally have to be redistributed to rebalance the load. Also, as nodes are added or deleted from the system, the data will have to be redistributed across the new number of nodes.
In a Parallel Database (PDB) System, data records are partitioned into data structures hereinafter referred to as "buckets". All the data records belonging to a bucket should always be placed into a single node. When adding new nodes into the PDB system, "buckets" of data must be moved from the existing nodes to the new nodes. A logical link is established with a predefined number of communication buffers for sending data records from the old residing node to the new node. As most relational database systems do not support a physical bucket in their storage organization, a table scan is required to select the to-be-moved records into communication buffers for redistribution. Because the table scan operation requires a table lock, it is logical to lock the same table on every PDB node to obtain the exclusive right on this particular table for data integrity and data placement consistency. Thus, every node will execute based on the same table sequence for data redistribution. However, the table locking makes performance one of the primary concerns for the operation of adding a new node. The faster the locks can be released, the less impact to the other ongoing transactions in PDB system.
The redistribution of data within a parallel database is traditionally done in a quiescent mode or a dynamic mode. In a quiescent mode, all functions other than data redistribution are halted until the data redistribution is complete. In an on-line or dynamic mode, data redistribution takes place concurrently with other PDB tasks.
There are two modes which have been proposed for data redistribution in a parallel database system. In a quiescent mode, the PDB system halts all other operations until the entire data redistribution takes place. In an on-line or dynamic mode, data redistribution takes place concurrently with other PDB tasks. In a commonly assigned, copending application, Ser. No. 08/116,089 entitled "Selecting Buckets for Redistributing Data Between Nodes in a Parallel Database in the Incremental Mode" S. G. Li, pending, which is hereby incorporated by reference, a new incremental mode is introduced. In the incremental mode, a set of quiescent data redistribution slices of the time alternate within time dedicated to other PDB tasks.
A quiescent mode PDB operation blocks any other operations using PDB during its operation for adding or removing nodes in PDB system. When adding new nodes, it is necessary to redistribute data for both load balancing and data availability. Instead of reloading the entire database, another approach to redistributing data is moving a portion of data from the existing nodes to the new nodes. In PDB, the unit of data movement is the bucket. As load balancing is the primary goal of data redistribution, it should be considered when choosing the appropriate buckets for moving. Because there are so many buckets in a node, using the traditional mathematical programming method for choosing the buckets cannot guarantee a feasible solution efficiently.
This invention describes a technique that can efficiently select the buckets to move to the new nodes with load balancing considered.