1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to a technique for efficient reorganization of a LOB table space that avoids processing well clustered LOBs.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples or records. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. Tables are assigned to table spaces. A table space is associated with direct access storage devices (DASD), and, thus, tables, are stored on DASD, such as magnetic or optical disk drives for semi-permanent storage.
A table space can be a system managed space (e.g., an operating system file system) or a database managed space. Each table space is physically divided into equal units called pages or pages. Each page, which may contain 4 K bytes, holds one or more rows of a table and is the unit of input/output (I/O). The rows of a table are physically stored as records on a page.
Traditionally, a RDBMS stored simple data, such as numeric and text data. In a traditional RDBMS, the underlying storage management has been optimized for simple data. More specifically, the size of a record is limited by the size of a page, which is a fixed number (e.g., 4K) defined by a computer developer. This restriction in turn poses a limitation on the length of columns of a table. To alleviate such a restriction, most computer developers today support a new built-in data type for storing large objects (LOBs). Large objects, such as image data, typically take up a great deal of storage space. A record is always fully contained within a page and is limited by page size. However, as users move towards working with image data and other large data objects, storing data in conventional records becomes difficult.
An index is an ordered set of references to the records or rows in a database file or table. The index is used to access each record in the file using a key (i.e., one of the fields of the record or attributes of the row). However, building an index for a large file can take a considerable amount of elapsed time. The process involves scanning all records in the file, extracting a key value and record identifier (rid) value from each of the records, sorting all of the key/rid values, and then building the index from the sorted key/rid values. Typically, the scanning, sorting, and index build steps are performed serially, which can be time consuming in the case of a large database file. When a RDBMS stores LOBs, an index is typically used to access the LOBs efficiently.
As data is added to and deleted from tables in a table space, the data may not be well organized. To resolve this, conventional systems enable a table space to be reorganized so that the data in the table space is better organized. For example, data may be reorganized sequentially. Some conventional systems perform reorganization of data by unloading the data out of the table space and then loading the data back into the table space so that the data is organized better. Current reorganization strategies do not consider the organization state of a table space or an index space, and, generally, reorganize the entire table and index space. Reorganizing the entire table and index space when only a small percentage of data is disorganized is an unnecessary cost and increases elapsed time for reorganizing data. This is especially true when the data consists of LOBs.
Therefore, there is a need in the art for an improved technique for efficient reorganization of a LOB table space that avoids processing well clustered LOBs.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for a computer implemented reorganization system.
In accordance with the present invention, a table space is reorganized in a database stored on a data storage device connected to a computer. When inserting or updating a LOB into a portion of the table space, a space map is marked to indicate whether the LOB was well inserted. When reorganizing the table space, when the space map indicates that a LOB was well inserted, reorganization of the portion of the table space in which the LOB was well inserted is avoided.