1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to inplace reorganization of a table space.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples or records. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. Tables are assigned to table spaces. A table space is associated with direct access storage devices (DASD), and, thus, tables are stored on DASD, such as magnetic or optical disk drives for semi-permanent storage.
A table space can be a system managed space (e.g., an operating system file system) or a database managed space. Each table space is physically divided into equal units called data pages or pages. Each page, which typically contains 4K bytes, holds one or more rows of a table and is the unit of input/output (I/O). The rows of a table are physically stored as records on a page. A record is always fully contained within a page and is limited by page size. As users move towards working with image data and other large data objects, storing data in conventional records becomes difficult.
An index is an ordered set of references to the records or rows in a database file or table. The index is used to access each record in the file using a key (i.e., one of the fields of the record or attributes of the row). However, building an index for a large file can take a considerable amount of elapsed time. The process involves scanning all records in the file, extracting a key value and record identifier (rid) value from each of the records, sorting all of the key/rid values, and then building the index from the sorted key/rid values. Typically, the scanning, sorting, and index build steps are performed serially, which can be time consuming in the case of a large database file. When a RDBMS stores LOBs, an index is typically used to access the LOBs efficiently.
Traditionally, a RDBMS stored simple data, such as numeric and text data. In a traditional RDBMS, the underlying storage management has been optimized for simple data. More specifically, the size of a record is limited by the size of a page, which is a fixed number (e.g., 4K) defined by a computer developer. This restriction in turn poses a limitation on the length of columns of a table. To alleviate such a restriction, most computer developers today support a new built-in data type for storing large objects (LOBs). Large objects, such as image data, typically take up a great deal of storage space.
If there is a clustering index defined, the DBMS will attempt to insert the record in the same order as the clustering keys. Maintaining data records in the clustering key order enables more efficient data retrieval when the clustering index is used to retrieve a set of records within a key range.
As data is added to and deleted from tables in a table space, the data may not be well organized. For example, there may be many small gaps of free space (i.e., space into which records may be inserted), which are not useful when large amounts of free space are needed. To resolve this, conventional systems enable a table space to be reorganized so that the data in the table space is better organized. For example, data may be reorganized sequentially. Some conventional systems perform reorganization of data by unloading every row out of the table space and then loading each row back into the table space so that the data is organized better. This is a non-inplace reorganization, as it moves data out of the table space, into another space, and the reloads the data back into the table space.
Unloading and reloading a table space has several problems. First, at least twice as much DASD space as is required for the table space is required to perform a non-inplace reorganization. Generally, LOB table spaces are used to store LOB values. Thus, LOB table spaces are very large and management of the work space is a usability problem.
Second, reading and writing every row of the table space to reclaim less than 100% of the free space is an unnecessary cost. Free space refers to the space that may be used for new records to be stored. Reclaiming free space is a reference to shifting data so that the free space is at the end of the table space. For example, if there are 500 cylinders of physical space for a table space, and if 400 cylinders contain data, but there is 100 cylinders of free space dispersed throughout the 500 cylinders, reclaiming would move the data so that the last 100 cylinders contained all of the free space.
Therefore, there is a need in the art for an improved technique for reorganization of a table space.