The invention relates generally to relational database systems and, more particularly but not by way of limitation, to a means for identifying and removing overflow rows from a relational database table.
Business environments are becoming progressively more complex for companies of all sizes. Much of this complexity arises from the growing amount of information it takes to conduct business and the many users and uses of this information. In this environment, a corporation's data sources may become its number one asset. The rapidly growing areas of e-business, data warehouses and enterprise resource management require data be delivered quickly and efficiently. These applications typically use relational databases as their data source, with the databases forming the foundation of the corporation's computing architecture. Since these databases act as the corporate data server, they can quickly turn into a single point of failure crippling and entire organization should they become disabled or function poorly.
Database failure can be measured in two ways: unplanned outages and poor response time. While it is easy to comprehend that unplanned outages have a direct impact on revenue for most corporations, business research firms have concluded that poor response time can result in the same type of revenue loss, particularly for e-business applications. One of the primary reasons a database responds poorly to queries is fragmentation. Fragmentation occurs as a natural by-product of database updates (e.g., through the structured query language “UPDATE” statement) and is generally embodied in “overflow” rows. An overflow row is a row (“record”) of data that is small enough to fit on a single page of memory but which is, in fact, stored across two or more pages. Similarly, index records/rows may also become fragmented.
Because an overflow row's data is unnecessarily distributed across multiple pages of memory, a simple request to retrieve a single overflow row can require the generation of multiple unnecessary physical input-output requests. Thus, overflow rows impose a performance penalty on any access to a fragmented row. It will be recognized by those of ordinary skill in the art, this penalty is especially acute in the case of table scans and index range scans.
By way of example, consider the DB2® database environment. (DB2 is a registered trademark of International Business Machines Corporation of Armonk, N.Y.) Within a DB2® environment, database tables are created within a tablespace where a tablespace is comprised of one or more containers. All data within a DB2® database is stored in pages (a specified amount of physical storage). Page size is determined at tablespace creation and may be specified in 4 K, 8 K, 16 K and 32 K sizes. Pages are grouped into allocation units called extents. During database insert operations, DB2® will write to a container until its specified extent capacity is reached, at which point a new extent worth of storage is allocated in the next container where write activities are continued. Over time, the normal process of memory allocation in response to update operations can generate significant numbers of overflow rows and impact the database's performance. To resolve the performance degradation resulting form overflow rows, the prior art reorganizes the affected table.
Referring to FIG. 1, reorganization process 100 in accordance with the prior art locks the table being reorganized (the “source” table) so that users can not access it during the reorganization process (block 105). Next, the source table's data is unloaded (block 110). The unloaded data is then reordered and overflow rows are eliminated as it is loaded back into a newly created table having the same structure as the source table (block 115) and indexes associated with the source table are rebuilt to reflect the structure and content of the new table (block 120). Once the new table is loaded and its indexes are rebuilt, the new table replaces the original table (block 125), where after access is permitted to the reorganized table (block 130).
One significant drawback with prior art schemes such as that outlined in FIG. 1, is that they can result in data being offline for an extended period of time. The larger the source table, the longer this outage. Thus, maintaining a table in accordance with prior art techniques to improve its performance can cause an outage—the very act these maintenance operations are designed to avoid. Accordingly, it would be beneficial to provide a mechanism to detect and remove overflow rows from a relational database table without significantly impacting the table's availability to users. It would be a further advance to provide a mechanism to dynamically remove overflow rows from a relational database table without significantly impacting the table's availability to users.