The present invention relates to a shared-nothing database management system which includes database servers that do not share data to be processed, and more particularly, to a database configuration management program which automatically changes the system configuration in accordance with a loading condition on the system.
<Description of Background, and Definition of Terms>
A shared-nothing database management system is one of configurations designed for building a large-scaled database system which is made up of a plurality of database servers.
In the shared-nothing database management system, a processor, a memory and a storage, which are main components of a computer, are assigned to each database server, so that the database servers do not share system components except for a network. Data which builds a database is also distributed to each database server, and is stored in a storage assigned thereto. Thus, each database server is responsible for processing a relatively prime subset of the database, and processes the data subset by subset.
From the features of the configuration as described above, the shared-nothing database management system has the advantage of the elimination of the need for exclusive control processing for shared resources, and a high scalability for an increase in the number of database servers.
However, if imbalances occur in the amounts of data handled by the respective database servers due to such a cause as a modification in the system configuration, a database server having a larger amount of data takes a relatively longer time for execution, thereby failing to efficiently process overall queries. For this reason, the shared-nothing database system is disadvantageously obliged to change the allocation of data to the servers as well, when the number of database servers is changed, in order to maintain the balance of data handled by the respective database servers.
[Resources of Database Server]
For adding a new database server to a shared-nothing database management system, a new server machine is generally added, and a database server is run on the new server machine. The addition of the new server machine results in increased resources for running the database server, and a resulting improvement in performance. In the present invention, among the foregoing resources, one associated with the improvement in processing performance is referred to as a “CPU resource.” Also, a resource associated with the improvement in storage input/output performance is referred to as a “storage I/O resource.”
[Data Relocation]
As described above, data must be balanced among the database servers when a database server is additionally installed in the shared-nothing database management system, or when a database server is removed from the shared-nothing database management system. In the present invention, this operation is hereinafter referred to as “relocation of data.”
[Virtualization of Storage]
Virtualization of storage is one of means for improving the usability of storages on a network. When a plurality of storages residing on a network are used from a single server, the operational management can be unified to reduce the management cost by making the storages appear as if the server were using a single storage.
In the present invention, a location in which the virtualization of storage is implemented is referred to as a “(storage) virtualization layer.” The storage virtualization layer may generally reside on a storage, a storage management middleware, or a file system.
The operating system manages the storages on a volume-unit basis. Therefore, in the present invention, a virtualized storage is referred to as a “logical volume.” On the other hand, each storage which forms part of a logical volume is referred to as a “unit volume.”
<Description of Conventional Approach>
A conventional approach employs ALTER NODEGROUP statement described in Online Manual “Manager's Handbook,” Chapter 30, bundled with DB2, Version 7.2. As illustrated in FIG. 2A, for adding a new database server using this approach, a data region is added to a storage which is allocated to the new database server (21), the new database server is added to a shared-nothing database management system comprised of a plurality of database servers (22), and then data is relocated among the database servers in order to balance the amount of data (23). On the other hand, for removing a database server, data is relocated to empty a data region allocated to a database server to be removed (24), and then the database server is removed (25), as illustrated in FIG. 2B.
JP-A-11-282734 describes a parallel database management system which may be regarded as another conventional approach. This method changes the correspondence of a database to a database processing unit without interrupting an online service, thereby permitting a particular database processing unit to directly access a database managed by a plurality of database processing units. This method is also characterized in that the database is divided into a consistent number of subsets.
The foregoing conventional approaches fail to take into consideration frequent addition and removal of database servers in a shared-nothing database management system, and cannot add or remove a database server unless time-consuming data relocation is performed. Further, the data relocation, which is a heavy load processing, will result in degraded processing performance in accessing a table from a user or an application. Thus, when such an approach is applied to a system which can change the number of servers in accordance with loading, the addition of a server is not reflected to an improvement in processing performance without causing a delay, but rather can be a factor of causing a degradation in processing performance. Also, since data is relocated after removal of a server has been determined, there is a delay until the server can be actually removed.
Also, in a method described in JP-A-11-282734, for correcting a load distribution by reconfiguring the correspondence between a database and database processing units, the database divided into a consistent number of subsets makes it difficult to balance the loading among the database processing units. This method can balance the loading by previously dividing database into many small fragments, in which case, however, a large amount of database subsets will be created, thereby increasing the cost associated with the operational management such as vacuum, backup, and the like.