1. Field of the Invention
The present invention relates to a coordination server that is connected to database servers each having a database storing data therein to form a distributed database and allocates data to the databases, as well as a data allocating method and a computer program product thereof.
2. Description of the Related Art
A distributed database having multiple databases has been developed to deal with enormous amounts of data. In such a distributed database, the data needs to be divided and allocated to different databases. Key range partitioning and hash partitioning are well-known examples of data allocating methods (see JP-A H6-139119 and H6-314299 (KOKAI), for example). In the key range partitioning and hash partitioning, a single column value or multiple column values of a table may be adopted.
In the key range partitioning, the value of the key that is to be used for the partitioning is predetermined, and data is divided and stored in a database to which this value is assigned. By dividing the data storage location, the data searching process can be executed in a parallel manner, and its throughput can be increased. For example, when dealing with a large amount of sales data, the data is stored in different databases (disks) by using “month” of the date of the data as a key so that the throughput in the parallel processing can be improved.
By partitioning the data in this manner, the load may become concentrated on a certain database, but an inefficient operation can be avoided because no irrelevant database would be accessed in a data search from range criteria defined in the target column. Furthermore, in a search including a natural join from the target column, no joining across different databases is required, and thus the performance can be greatly improved.
In the distributed database, however, if the data partitioning is out of balance, a heavy load may be concentrated on a specific database during the search. This lowers the effectiveness of the partitioning. Because of the changing tendency of the data entered into the databases, the data sizes of the databases often become unbalanced. It is therefore difficult to avoid such an imbalance in accordance with predetermined data partitioning rules. For this reason, improved partitioning methods have been suggested, for example, such as a method with which hashes are changed to dynamically change key ranges. With those methods, however, the data has to be relocated in accordance with the changes of the key ranges and hashes, which actually increases the processing load.