The present invention relates generally to a database, and more specifically, to managing overflow access records in a database.
Databases, in particular relational databases, underlie many of today's business and technical systems. They are convenient tools for storing and accessing data in a direct manner using keys and indexes. A relational database system typically includes a series of database tables, each organized into pages which store data records. A data record may have a fixed or variable length. Typical database commands include add a data record, read or query a data record, update a data record and delete a data record.
Indexes may be used in order to improve (e.g. decrease) access time to records in a database. Index creation is a well-known technique in the field of relational databases. There may be many indexes for a single database table because a record may include many fields or attributes, and fast access to the record may be required under many different access paths. Scanning the complete content of a database table entry by entry (i.e., record by record) may be very time consuming. Indexes, either as primary keys which are unique or, as secondary keys which may have double entries, are regularly used for faster access to defined data records. When a data record is updated, the indexes are also updated, which can also be a very time consuming process depending on the technique used. If a data record has a variable length because one or more attributes of the table have a variable length, the updated data record may no longer fit onto a page of the table. Typically, pages have a defined fixed length in a database system. Four kilobytes (kBs), eight kBs, sixteen kBs, and thirty-two kBs are common page sizes.
Several approaches have been taken in order to deal with records not fitting onto a page in a database system. U.S. Pat. No. 7,555,491 discloses techniques to repair overflow rows in a relational database table. In one implementation, the physical memory pages of a source table are interrogated to identify overflow rows. In another implementation, overflow rows generated during database update operations are identified from non-table sources (e.g., log files). In general, overflow rows are repaired by unloading one or more rows identified as overflow rows, locking the source table from outside access, disabling or deleting any associations with the unloaded rows, deleting the identified overflow/unloaded rows from the source table; reloading the previously unloaded rows back into the source table, restoring any associations previously disabled/deleted, and unlocking the tables. Rows may be processed in groups that are small enough not to significantly impact users accessing either the source or associated tables. U.S. Pat. No. 7,555,491 discloses removing overflow rows after they are created. The tables need to be locked; and thus, these techniques may require taking the database system offline such that a normal online operation may no longer be possible.
U.S. Pat. No. 7,177,885 discloses a method of reorganizing a table space in a database. It includes reading a row of data from the database, analyzing the row of data read out from the database, determining whether to eliminate or retain the row of data based on at least one predetermined rule, reloading the row of data into the database when it is determined that the row of data complies with the predetermined rule, eliminating the row of data when it is determined that the row of data does not comply with the predetermined rule, and rebuilding an index related to the database to include keys that correspond to the reloaded row of data.
U.S. Patent Publication No. 2009/0049199 discloses an apparatus and system for reorganizing a set of database partitions based on a ranked set of database partitions. The apparatus includes an evaluation module to determine a fragmentation indicator for each database partition within a set of database partitions. A ranking module ranks the set of database partitions according to a rank criterion. The rank criterion may include, at a minimum, the fragmentation indicator. A reorganization module reorganizes a subset of the database partitions in the ranked set of database partitions. The subset is selected, at least in part, by a rank order for the ranked set of database partitions. The subset may be selected such that ranked database partitions, satisfying a dynamic reorganization threshold, are included in the subset.
Another commonly used approach to solving the issue of updates in a database system is to use percentage free space (PCTFREE) (see details see below). If PCTFREE is set too large, too much disk space may be wasted and/or the buffer pool hit ratio may decrease because more pages may need to be read to fulfill a select request because fewer records per page are transferred. If PCTFREE is not set at all or is too small, a lot of overflow access records may be created on update operations or on insert operations in tables with clustered index structures and/or a page lock may lock too much information, thus reducing concurrency. Moreover, reorganization may be required to reconcile overflow access records to improve performance. Offline reorganizations may often be unacceptable and online reorganizations may cause significant input/output, thus decreasing performance of regular workloads. If PCTFREE is set incorrectly (too small, too large) and needs to be changed, a reorganization of the database system may be required. For databases with large numbers of tables, administrators may not have time to manually determine the right PCTFREE setting so PCTFREE may either be set wrong or not at all (meaning to zero). Furthermore, administrators may forget to adjust PCTFREE if record compression is enabled, potentially causing significantly more overflow access records.