This invention relates to database management technology for storing data in a plurality of storage areas. Generally, in a database system having a database, data held in the database changes momentarily. When more and more new data is added to the database over time, the data volume may exceed the capacity of the storage area prepared for the database.
For this reason, in the conventional method for handling a large scale database, the database is divided into several partitions which are stored in separate storage areas. There are three methods for partitioning a database for storage into a plurality of database storage areas: key-range partitioning, hash partitioning and equi-partitioning. Depending on the partitioning method used, one partition may be correlated with one storage area, or two or more partitions may be correlated with one storage area. In case of new data being added over time, two ways of database expansion are available. When the key range partitioning method is used, a new key range can be added or a key range can be divided. This approach may not require addition of a new database storage area. The other approach is to add a new storage area or expand the existing database area without changing the partitioning. This approach increases the database storage area in response to the increase in data volume. In contrast, if the hash partitioning method is used, it is possible to flexibly handle data volume increases with reduced overhead.
However, though initially provided database storage areas can store newly added data, they may delete old data, which means that the data volume in the storage areas does not always increase. In this case, to make unused area to store new data, the database is reorganized so that old data is deleted.
On the other hand, for a database system which stores data cumulatively without deleting old data, since it takes resources management cost to prepare sufficient storage areas for expected data volume increases in the future, actually it is provided with database storage areas just enough to store the expected data volume in the near future only; if the data volume becomes too large to store, a new storage area is added.
Addition of a database storage area for a database necessitates redefinition of the database. The simplest way is to make a backup copy of the contents of the database and, after the database redefinition for the storage area addition, reload the backup database. For a large scale database, this procedure requires much time to make a backup copy of the database and needs considerable backup media cost and the process of reloading is very time consuming.
The first public known solution to this problem is the U.S. Pat. No. 4,412,285. This solution discloses a technique for pre-partitioning into buckets before partitioning a table by hashing, and correlating the buckets with virtual processors for their management.
As the second solution in the public domain, a technique which reorganizes, according to the frequency of access, the data in a key-range partitioned database on a parallel database system composed of a plurality of processors has been disclosed in the Japanese Patent Prepublication No. 139119/94.
As the third solution in the public domain, a technique which hierarchically partitions a database on a parallel database system composed of a plurality of processors has been disclosed in Japanese Patent Prepublication No. 314299/94.
Also, disclosures have been made on a technique which, for a partitioned database on a parallel database system composed of a plurality of processors (Japanese Patent Prepublication No. 141394/95 as the fourth solution in the public domain) or for a database partitioned using a hash function (Japanese Patent Prepublication No. 293006/97 as the fifth solution in the public domain), allows data to be stored in a newly added storage area without the need for data rearrangement after the number of database partitions has been changed. In this case, however, though data rearrangement is unnecessary, all memories for partitions are checked at the time of data search.
In the above methods based on prior art, there is a problem concerning large scale databases. When the unused area in an initially given database storage area becomes insufficient due to addition of data to the table in the database, rearrangement of data is needed to increase the database scale.
In databases which use the abovementioned hash partitioning method, since the result of hashing is dependent on the number of partitions or the number of given database storage areas, if a new storage area is added, hashing of the data stored so far must be done again according to the updated number of partitions and stored again. This is a very time-consuming costly process, making it impossible to operate the database system efficiently.
If data is moved to another database storage area from each existing database storage area without restoring data as a result of re-hashing for rearrangement, loading of all data in each storage area must be done even if some data is to be left in the same storage area as a result of re-hashing. This makes it difficult to reduce data loading time and cost.
In databases which use key range partitioning, data rearrangement for a newly added storage area is not made taking the volume of data into consideration, though it is possible to add key ranges for maximum and minimum data or divide or merge existing key ranges.
The object of this invention is to provide a database management method or equipment that optimally stores data in a plurality of database storage areas.
To achieve the above object, the invention uses means to correlate a plurality of key ranges with a plurality of data storage areas in the memory so that when data is to be stored in the database, said data is stored in the data storage area correlated with the key range containing said data, and if an additional storage area is needed, a given volume of data from the plurality of data storages is moved to the above-said newly added storage area and the key ranges correlated with said moved data are correlated with said data storage areas.