FIG. 1 illustrates a shared-nothing network 100 known in the art. The shared-nothing network or architecture 100 includes a master node 102 and a set of shared-nothing nodes 104_A through 104_N. Each shared-nothing node 104 has its own private memory, disks and input/output devices that operate independent of any other node in the architecture 100. Each node is self sufficient, sharing nothing across the network. Therefore, there are no points of contention across the system and no sharing of system resources. The advantage of this architecture is that it is highly scalable.
Database systems store data in tables distributed across shared-nothing nodes. Data is stored by assigning each datum (e.g., record or row) to one of the nodes. Data is typically assigned to nodes according to one of two principles. One approach is hash distribution, which uses a hash function to map data to nodes. Another approach is to assign data to nodes in a round-robin or random manner.
The part of the datum for which the hash is computed is referred to as a distribution key. The distribution key can be a compound key, i.e., consisting of several columns of a row. Hash distribution results in uniform data distribution and the co-location of records with the same distribution key (i.e., records with the same distribution key are assigned to the same node). Co-location of data is frequently exploited in join operations where data from different database tables are joined. Join operations are usually the most costly operation in a query workload. By selecting frequently used join columns as distribution keys, joins can be performed on a per-node basis without having to redistribute the data among nodes between processing steps. All rows of a table are distributed using the same distribution keys. Individual tables generally differ in choice of distribution key.
Nodes are added to a shared-nothing system to accommodate more data or additional query workloads. When new nodes are added to a system, data needs to be redistributed. Data redistribution commonly entails the examination and positional reassignment of individual datum. Reassessing each row of a large data store can take a significant amount of time, e.g., reassigning tens or hundreds of terabytes of data may take several days. Consequently, it is common practice to schedule downtime of several days when a node is added to a shared-nothing system. It would be desirable to minimize the downtime associated with the expansion of a shared-nothing data store.