1. Field of the Invention
The invention relates to a technique, specifically a method, apparatus, and article of manufacture that implements the method, to determine an amount of space to allocate for a dataset as the dataset grows. This technique is particularly, though not exclusively, suited for use within a database management system.
2. Description of the Related Art
Database management systems allow large volumes of data to be stored and accessed efficiently and conveniently in a computer system. In a relational database management system, data is stored in database tables which effectively organize the data into rows and columns. In the database management system, a database engine responds to user commands to store and access the data. In the computer system, database objects, like tables and indexes, are contained in datasets. A dataset is also referred to as a file. When records are added to the database table, the database management system writes those records to the dataset associated with the specified table.
The dataset is typically stored on one or more hard disk drives. The amount of space available on the disk drives is limited and is managed by an operating system. An extent is an amount of space allocated on a logical volume for storing part of a dataset. A logical volume can be a single disk drive, a portion of a single disk drive, or a portion of multiple disk drives.
A dataset may have one or more extents. In some operating systems, each dataset is associated with a primary extent. As the size of a dataset grows, additional, or secondary extents may be allocated to provide additional space for the dataset. The operating system has a limit as to the total number of secondary extents that may be allocated to a dataset. In a conventional operating system, the size of each secondary extent is the same.
An extent comprises pages for storing the dataset. The page size can be equal to four kilobytes, eight kilobytes, sixteen kilobytes, or thirty-two kilobytes. Logical volumes are mapped to physical disk drives and store data in units such as cylinders, for example. A cylinder has a predefined amount of storage space. The amount of storage space provided in a cylinder is specific to the disk drive model. The extent size may be specified in bytes, kilobytes, megabytes, pages, or cylinders.
In FIG. 1, an exemplary dataset 20 has multiple extents in accordance with the prior art. The dataset 20 is stored in a primary extent 22 of size p and n secondary extents of size s, S1 to Sn, 24 to 26, respectively. The database has parameters that allow a user to specify the size p of the primary extent and the size s of the secondary extents.
When a dataset is created, the maximum size of that dataset is implicitly determined, and a primary extent is allocated initially to store that dataset. However, the size of the primary extent is typically smaller than the maximum size of the dataset. Secondary extents are allocated on demand to store the dataset as the dataset grows. The number of secondary extents that can be allocated for a dataset, also referred to as a maximum number of secondary extents, is limited.
In one database management system, a system administrator may specify a primary and secondary extent size when creating tablespaces or indexes, or accept default sizes. When the size of the secondary extent is small, typically the maximum number of extents is reached before the dataset can reach the maximum possible size. Therefore, the dataset is prevented from growing, and no additional data or records may be added to that dataset. Hence, an operation to add data cannot be completed and an application failure occurs, which may result in an application outage. To increase the maximum amount of space that can be used for the dataset, the system administrator, through the facilities of the database management system, defines a new dataset with a new larger primary extent size and/or secondary extent sizes that are sufficiently large to store the maximum size of the dataset, copies the data from the old dataset to the new dataset, and renames the new dataset with the name of the old dataset. Creating the new dataset and copying the data takes time and increases the length of the application outage.
Since the system defined default size for the secondary extents is typically very small, the system administrator usually provides an explicit secondary extent size that is larger than the default secondary extent size to help prevent using up available extents. However, increasing the size of secondary extents may result in wasted space—especially for small datasets. Furthermore, it is not known whether a dataset will reach its ultimate size, therefore much of the allocated space may not be used.
Therefore, there is a need for a technique to improve the allocation of secondary extents. This technique should efficiently allocate space for small datasets. This technique should also reduce the likelihood of using the maximum number of extents prior to reaching the maximum size of the dataset.