In distributed computing, there are two architectures for accessing resources from a plurality of nodes: shared-everything and shared-nothing. Each of these architectures has its advantages and disadvantages. In the shared-everything architecture, resources are shared among all nodes in a system and so are accessible by the nodes. To achieve the resource sharing, exclusive control is exercised to prevent conflicts in accessing resources, such as databases, which reduces processing efficiency. An increase in the load of the exclusive control becomes more prominent with an increase in the scale of a system and also in the number of nodes.
In the shared-nothing architecture, on the other hand, nodes do not share any resources and all nodes have sole access. Since database resources are not shared among the nodes, no conflicts occur among the nodes. Therefore, linear performance improvement is expected by adding nodes.
In the shared-nothing architecture, a database is divided into a plurality of partitions, for example. In addition, a plurality of data records included in data tables forming the database are grouped according to type, and each group is allocated to any of the partitions. The data of each group is stored in the corresponding partition. Each partition is accessible by any one of nodes. This prevents conflicts in data access.
While a system keeps operating, an amount of managed data may increase and the system may lack resources. For example, scale-out is one way to add new nodes. The scale-out involves dividing data tables according to dividing of a database into partitions and reassigning the partitions to the nodes.
One of techniques for dividing data uses a hash function, for example. This technique uses one or a plurality of data elements in a database as partitioning keys, applying the hash function to the partitioning keys, and divides data into a plurality of buckets.
Please see, for example, Japanese Laid-open Patent Publication No. 2001-142752.
However, the shared-nothing architecture has a drawback in which it is difficult to determine which data to store in each of a plurality of partitions such that nodes have a small difference in access load. For example, the way of regularly allocating data to partitions using a hash function or the like does not take patterns of data access into account. Therefore, there is a possibility that accesses may be concentrated on a partition where data with high access frequency exists, and therefore an access load imbalance may occur among nodes. If this happens, nodes with higher loads become slow in processing, which reduces the operating efficiency of the system.
Such an access load imbalance among nodes would be reduced if a skilled engineer spent a considerable time on dividing a data table into partitions and assigning the partitions to the nodes through hearing about customers' business contents. However, it is difficult for even such a skilled engineer to always make an appropriate decision, and the actual data access status may be greatly different from expected one. If this happens, an access load imbalance may occur among the nodes.
Further, patterns of access to resources in business are likely to vary. Even if equal access load balancing is achieved when a system starts to operate, an access load imbalance may occur among the nodes while the system keeps operating.