Modern computer systems often involve multiple, individual processors or nodes which are interconnected via a communication network. Large amounts of information are often stored and processed in such systems. In addition to processing equipment, each node typically has digital storage devices (e.g., magnetic disks) for storing the information. The information is often arranged as a database that occupies the available storage space at the various nodes in the system.
The techniques employed for arranging the required storage of, and access to a database in a computer system with multiple nodes are dependent on the requirements for the specific system. However, certain requirements are common to most systems. All data in the database should be available for access from any node in the system. The amount of storage overhead and processing overhead must be kept at a minimum to allow the system to operate efficiently, and the storage/access strategy must generally be immune to problems (e.g. overload) occurring at any one node.
Two general techniques of database storage, or partitioning, are employed in modern systems. The first, data sharing, generally involves providing physical access to all disks from each node in the system. Alternatively, this involves storing a copy of the complete database, with internal partitions, at each node in the system. Each node then has access to all partitions in the database, and is immune from problems at any one node. However, to maintain coherency of the database, global locking or change lists are necessary which ensure that no two nodes inconsistently change a portion of the database. An example of a data sharing architecture is described in U.S. Pat. No. 4,853,843 entitled "SYSTEM FOR MERGING VIRTUAL PARTITIONS OF A DISTRIBUTED DATABASE," issued on Aug. 1, 1989 and assigned to Tektronix, Inc. Described therein are nodes which contain separate instances of an initial database (i.e., virtual partitions) along with change lists which tabulate changes made to each partition. The change lists are then used to merge the virtual partitions. The processing overhead of the locks and/or lists, as well as the redundant storage of data are undesirable characteristics of this technique.
The second technique of data storage involves physically partitioning the data and distributing the resultant partitions to responsible or owner nodes in the system which become responsible for transactions involving their own, corresponding partitions.
This shared nothing architecture requires additional communication overhead to offer access to all of the data to all nodes. A requesting node must issue transaction requests to the owner node. The owner node then either: 1) performs the requested transaction related to its corresponding partition (i.e., function shipping) or 2) transfers the data itself to the requesting node (i.e., I/O shipping). An example of the shared nothing technique is described in U.S. Pat. No. 4,925,311 entitled "DYNAMICALLY PARTITIONABLE PARALLEL PROCESSORS," issued on May 15, 1990 and assigned to Teradata Corporation, in which each access module processor is assigned a substantial body of data stored in large individual memories and is constantly queuing incoming and outgoing messages from the network.
A problem with the shared nothing approach is the potential for processing overload at any one node and the resultant inability of that node to accept or process transactions relating to its partition. This condition can occur if, for example, the partition experiences unusually high transaction activity due to an unexpected event. Therefore, a need exists to balance the processing load among nodes in a shared nothing system should any one node become overloaded.
Load balancing may be done during the initial physical partitioning of the database. For example, predictions of the demand for certain physical partitions may be made, followed by an appropriate distribution of partitions to the nodes. Other load balancing techniques involve the partitioning of processing assets, for example, processors, disks, etc. between competing user programs; or an optimization of the transaction sequence. An example of load balancing is described in U.S. Pat. No. 4,843,541 entitled "LOGICAL RESOURCE PARTITIONING OF DATA PROCESSING SYSTEM," issued on Jun. 27, 1989 and assigned to International Business Machines Corporation, in which partitioning of resources into a plurality of logical partitions, defined by an administrator, is performed. The assets in a logical partition are then assigned to guest programs.
Examples of optimizing transactions (or queries) are found in U.S. Pat. No. 4,769,772 entitled "AUTOMATED QUERY OPTIMIZATION METHOD USING BOTH GLOBAL AND PARALLEL LOCAL OPTIMIZATIONS FOR MATERIALIZATION ACCESS PLANNING FOR DISTRIBUTED DATABASES" and U.S. Pat. No. 4,925,311 (discussed above). Both techniques, however, rely on an active owner node having processing resources available for transactions related to its corresponding partition.
Absent from all the above-mentioned patents is an automated method and an apparatus for load balancing in a shared nothing system when the owner node becomes overloaded, but which nevertheless provide timely completion of transactions related to the corresponding physical partition. The method should effectively reduce the transaction costs on the overloaded node, thereby minimizing the node's processing load. These costs generally include, in the shared nothing approach, the pathlength of the transaction (function) itself, the I/O shipping or function shipping communication overhead, and additional factors including two-phase commit processing necessary for each database access and concurrency control (in the case of I/O shipping). The approach should also minimize the overall processing load of the system.