This disclosure relates generally to the field of computer databases. More particularly, but not by way of limitation, it relates to a technique for balancing data across partitions of a table space when loading data into the table space.
Computer databases have been an important part of enterprises for decades. Two major types of computer databases are hierarchical databases and relational databases. Hierarchical databases, which were developed prior to relational databases, are arranged into tree-like hierarchical structures representing logical relationships. Relational databases organize data into tables of rows and columns. One common relational database management system (DBMS) is the DB2® database system marketed by International Business Machines Corporation (IBM). (DB2 is a registered trademark of IBM.)
The actual data for a relational database is kept in a storage location generally referred to as a tablespace. A tablespace is the physical portion of the database used to allocate storage for managed segments of the database, which are the database objects such as tables and indices. Tablespaces may be physically stored in files in a filesystem defined by an operating system, or may be stored directly over devices.
Some DBMSs, including DB2 systems, allow tablespaces to be partitioned into areas or partitions for performance or other management reasons. Typically, partitioned tablespaces only contain a single table, while non-partitioned tablespaces may contain multiple tables. Customers may partition tablespaces according to values in columns corresponding to database keys. The customer generally specifies key value limits for each partition at tablespace definition time, sometimes known as LIMITKEYs. These LIMITKEYs are typically specified as the highest key value that can be stored in a partition. As data is inserted or loaded into the tables, data with key values are stored in a partition chosen by the relationship of the key values with the collection of LIMITKEYs defined for the partitions of the tablespace. The LIMITKEY value for the last partition is ignored in some implementations, avoiding the possibility that some table data could not be inserted into any partition if the LIMITKEY value for the last partition is less than the highest actual key value in the data.
If the customer chooses key value limits that do not match the data that is to be stored in the tables of the tablespace, then database performance problems may arise. For example, the tablespace may have much more data in one partition than others, because of an imbalance of key values in the data. When such problems arise, such as after a large number of updates to the database, the customer may need to reorganize the database to resolve the performance problems by balancing the partitions.
This problem is particularly an issue in situations where the customer is performing a mass data load into a database, such as to replace an existing set of rows. In such a replacement situation, the existing data in the tablespace is removed, and the new data is inserted as a mass insert. In such a situation, the LIMITKEYs defined for the tablespace partitions, although they may have been appropriate for the previous data, may no longer match the replacement data. However, customers have had no way to repartition the tablespace as a part of the loading process, leading some customers to follow a load process with an immediate reorganization of the database, which increases the time during which the tablespace is typically unavailable for use.