A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent Office patent files or records, but otherwise reserves all copyrights whatsoever.
The present invention relates to database management systems and more particularly to a method and system for searching for free space in a table of a relational database management system, where the table has a clustering index defined on at least one or more of its columns.
A database management system (DBMS) includes the combination of an appropriate computer, direct access storage devices (DASD) or disk drives, and database management software. A relational database management system is a DBMS that uses relational techniques for storing and retrieving information. The relational database management system or RDBMS comprises computerized information storage and retrieval systems in which data is stored on disk drives or DASDs for semi-permanent storage. The data is stored in the form of tables which comprise rows and columns. Each row or tuple has one or more columns.
A typical database management system includes both database files and index files. The database files store data in the rows and columns of tables stored on data pages. In such a table, the rows may correspond to individual records while the columns of the table represent attributes of the records. For example, in a customer information table of a database management system, each row might represent a different customer, while each column represents different attributes of the customers such as the name of each customer, the amount owed by each customer and the cash receipts received from each customer.
Instead of providing for direct sorting and searching of the records in the tables, the database management system relies on the index files, which contains information or pointers about the location of the records in the tables stored in the database files. The index can be searched and sorted (scanned) much more rapidly than can the database files. An index is scanned through transactions in which criteria are stipulated for selecting records from a table. These criteria include keys, which are the attributes by which the database finds the desired record or records using the index.
An index for a table is said to be xe2x80x9cclusteredxe2x80x9d when the rows of the underlying table are physically stored so that each data page contains all rows, and only those rows, of a key interval of the index. Referring to FIG. 22, there is illustrated an index on unclustered data. FIG. 23, shows a B-Tree index for the same data; however, in FIG. 23 the data has been clustered. The differences between clustered and unclustered indexes can be seen by focusing on the key value xe2x80x9cNxe2x80x9d. In both the B-Tree indexes of FIG. 22 and 23, the leaf nodes of the B-Tree indexes point to four records in the underlying tables; these records are identified by the record identifiers of four keys. The record identifiers are rid1, rid2, rid3 and rid4. In the unclustered B-Tree index of FIG. 22, the key interval  less than N,rid1 greater than  to  less than N,rid2 greater than  is stored on two different data pagesxe2x80x94specifically data pages 1 and 2. Similarly, the key interval  less than N,rid3 greater than  to  less than N,rid4 greater than  are stored on data pages 2 and 3. This contravenes the definition of a clustered index that the rows of the underlying table be physically stored so that each data page contains all rows, and only those rows of a key interval of the index. Turning to the B-Tree index and underlying data pages of FIG. 23, the same data is shown clustered. Specifically, the key interval  less than N,rid1 greater than  to  less than N,rid2 greater than  points to rows that are stored on data page 2, and the key interval  less than N,rid3 greater than  to  less than N,rid4 greater than  points to data that is stored on data page 3. Note that the records need not be adjacently located on the same data page. While the records underlying  less than N,rid3 greater than  and  less than N,rid4 greater than  both lie on data page 3, they are separated by a record having an attribute value equal to xe2x80x9cPxe2x80x9d.
Even if an index is clustered to begin with, the index and table may become disordered as a result of rows being added or deleted from the table. Specifically, if a new data row is to be added to a table, then this data would preferably be added on the data page for the key interval of the index in which the data row falls. However, there may be no or insufficient free space on this data page to accommodate the new record, and the data row may, as a result, have to be stored on another data page on which sufficient free space is available, or appended to the end of the table. Accordingly, even if an index is clustered to begin with, it may become unclustered, at least to some extent, as additional information is added. The extent to which a clustered index has become unclustered is measured by the cluster factor; the cluster factor is measured on a scale of 0 to 1, where 1 means the index is fully clustered.
A clustering index is an index whose cluster factor is, as much as possible, maintained or improved dynamically as data is inserted into the associated table. The cluster factor is maintained or improved by inserting new rows as physically close to the rows for which the key values of this index are in the same range, as is possible. Having a high cluster factorxe2x80x94a cluster factor close to 1xe2x80x94increases data retrieval efficiency when such retrieval involves accessing sequential ranges of that index""s values.
When inserting a row into a table in a relational database management system (RDBMS), a search for free space must be done to find space for the row being inserted. Free space maps are used in the art to keep track of the free space that is available in a table. A free space map is a map or directory of the pages in a table, together with an approximation of the amount of the free space on each of these pages. This enables the pages that might have enough space for a record to be quickly looked for, without requiring the page to actually be fixed.
A free space map spans a plurality of free space map pages when it is too large to accommodate on a single page. A free space map page is one of the plurality of free space map pages containing a portion of the map. Each free space map page includes an array of free space entries for a series of pages that form part of the table. By fixing one free space map page, it is possible to scan through space information from many pages in the actual data files, in order to find a page that potentially has enough space for a record to be inserted. Each entry in the free space map page indicates the approximate amount of free space available in the page index in the array. When a row is to be inserted, the free space map is searched for a data page with sufficient free space to hold the record. The search usually starts at the beginning of the free space map, or from the position where space was last found in this free space map. In either case, the entire free space map is usually searched until a data page with sufficient free space is found or until the entire free space map has been searched. If a page with free space is found, the row is inserted on to the page. If no free space is found, the row is appended to the table.
A record, which is being inserted into a table having a clustering index defined on A one or more of its columns, should preferably be inserted on the target page in order to maintain or improve the cluster factor. The target page is the ideal page on which to insert a new data record into a table that has a clustering index, as this page contains the same or next key value in the table as that being inserted. However, this may not always be possible as the target page may have insufficient free space to accommodate the record. If there is insufficient space for the record on the target page, then the record must be inserted on another page on the table.
Accordingly, there remains a need for a method for efficiently searching for free space in a table of a DBMS or RDBMS that has a clustering index, which method is both efficient and facilitates clustering.
An object of one aspect of the present invention is to provide an improved database management system.
Preferred embodiments provide an improved method, system, and program for determining a location in a database to insert a new record in a database table when the new record includes a key value. A determination is made of a target page having a record in the table that includes a key value that equals the key value of the new record. The pages in storage are capable of storing records from the table. The new record is stored in the target page if the target page has sufficient free space to store the new record. A determination is made of a page closest to the target page that has sufficient free space to store the new record if the target page does not have sufficient space. The new record is stored in the determined closest page if the target page does not have sufficient free space.
In accordance with further aspects of the preferred embodiments, there is provided a method for searching a table for free space for inserting a new record into the table in a database management system. The table resides in a storage medium, and comprises a series of table pages capable of storing a plurality of records. The table has a clustering index defined in a column of the table. The new record has a new attribute for storing in the column when the new record is stored in the table. The method includes the step of searching for a target page in a series of target pages. The target page is selected to include a target record having a target attribute stored in the column of the table. The target attribute is selected to be equal to the new attribute if the column contains a record equal to the new record, otherwise, the target attribute is selected to be either a next higher attribute above the new attribute and stored in the column, or a next lower attribute below the new attribute and stored in the column. The method also comprises the step of searching the target page for sufficient free space to accommodate the new record. If sufficient free space is found on the target page, then the method inserts the new record on the target page. If sufficient free page is not found on the target page, then the method searches the target neighborhood of pages in a series of table pages. The target neighborhood of pages surround the target page. If sufficient free space is found on a page in the target neighborhood of pages, then the new record is inserted on to the page in the target neighborhood of pages. If sufficient free space is not found in the target neighborhood of pages, then the method searches for sufficient free space to accommodate the new record in the series of table pages outside the target neighborhood of pages. If sufficient free space is found outside the target neighborhood of pages, then the new record is inserted on a non-neighboring page outside the target neighborhood of pages. If sufficient free space is not found outside the target neighborhood of pages, then a new page having a new record is appended to the end of the table.
In accordance with a second aspect of the preferred embodiments, there is provided a method for searching a table for free space for inserting a new record into the table in a database management system. The table resides in a storage medium, and includes a series of table pages capable of storing a plurality of records. The table also has a clustering index defined in a column of the table. The method includes a step of searching a set of table pages in the series of table pages. If sufficient free space is found in the set of table pages, then the record is inserted in the set of table pages. If the set of table pages includes a single page having sufficient free space to accommodate the new record, then the new record is inserted on to the single page. If the set of table pages includes at least two pages having sufficient free space to accommodate the new record, then the new record is inserted on an emptiest page in the at least two pages. If sufficient free space is not found on the set of table pages, then the new page having the new record is appended to the end of the table.
In accordance with a third aspect of the preferred embodiments, there is provided a computer program product for use on a computer system wherein transactions are executed for inserting data into a table in a database system. The table resides in a storage medium, and comprises a series of table pages capable of storing a plurality of records. The table also has a clustering index defined on a column of the table. The new record has a new attribute for storing in the column of the table when the new record is stored in the table. The computer program product includes a recording means and means recorded on the medium for instructing the computer system to perform the following steps. The computer system searches for a target page in a series of target pages. The target page is selected to include a target record having a target attribute stored in the column of the table. The target attribute is selected to be equal to the new attribute if the column contains a record equal to the new record, otherwise, the target attribute is selected to be either a next higher attribute above the new attribute and stored in the column, or a next lower attribute below the new attribute and stored in the column. The computer system searches the target page for sufficient free space to accommodate the new record. If sufficient free space is found on the target page, then the computer system inserts the new record on the target page. If sufficient free page is not found on the target page, then the computer system searches the target neighborhood of pages in a series of table pages. The target neighborhood of pages surround the target page. If sufficient free space is found on a page in the target neighborhood of pages, then the new record is inserted on to the page in the target neighborhood of pages. If sufficient free space is not found in the target neighborhood of pages, then the computer system searches for sufficient free space to accommodate the new record in the series of table pages outside the target neighborhood of pages. If sufficient free space is found outside the target neighborhood of pages, then the new record is inserted on a non-neighboring page outside the target neighborhood of pages. If sufficient free space is not found outside the target neighborhood of pages, then a new page having a new record is appended to the end of the table.
In accordance with a fourth aspect of the preferred embodiments, there is provided a computer program product for use on a computer system, wherein transactions are executed for inserting data into a table in a database. The table resides in a storage medium and comprises a series of table pages capable of storing a plurality of records. The table has a clustering index defined on a column of the table. The computer table product comprises a recording medium, and means recorded on the medium for instructing the computer system to perform a number of steps. The computer system searches a set of table pages in the series of table pages. If sufficient free space is found in the set of table pages, then the record is inserted in the set of table pages. If the set of table pages includes a single page having sufficient free space to accommodate the new record, then the new record is inserted on to the single page. If the set of table pages includes at least two pages having sufficient free space to accommodate the new record, then the new record is inserted on an emptiest page in the at least two pages. If sufficient free space is not found on the set of table pages, then the new page having the new record is appended to the end of the table.
In accordance with a fifth aspect of the preferred embodiments, there is provided a data processing system for inserting data into a table in a database system. The table resides in a storage medium, and comprises a series of table pages capable of storing a plurality of records. The table also has a clustering index defined on a column of the table. The new record has a new attribute for storing in the column of the table when the new record is stored in the table. The computer system searches for a target page in a series of target pages. The target page is selected to include a target record having a target attribute stored in the column of the table. The target attribute is selected to be equal to the new attribute if the column contains a record equal to the new record, otherwise, the target attribute is selected to be either a next higher attribute above the new attribute and stored in the column, or a next lower attribute below the new attribute and stored in the column. The computer system searches the target page for sufficient free space to accommodate the new record. If sufficient free space is found on the target page, then the computer system inserts the new record on the target page. If sufficient free page is not found on the target page, then the computer system searches the target neighborhood of pages in a series of table pages. The target neighborhood of pages surround the target page. If sufficient free space is found on a page in the target neighborhood of pages, then the new record is inserted on to the page in the target neighborhood of pages. If sufficient free space is not found in the target neighborhood of pages, then the computer system searches for sufficient free space to accommodate the new record in the series of table pages outside the target neighborhood of pages. If sufficient free space is found outside the target neighborhood of pages, then the new record is inserted on a non-neighboring page outside the target neighborhood of pages. If sufficient free space is not found outside the target neighborhood of pages, then a new page having a new record is appended to the end of the table.
In accordance with a sixth aspect of the preferred embodiments, there is provided a data processing system for inserting data into a table in a database system. The table resides in a storage medium, and comprises a series of table pages capable of storing a plurality of records. The table also has a clustering index defined on a column of the table. The computer system searches a set of table pages in the series of table pages. If sufficient free space is found in the set of table pages, then the record is inserted in the set of table pages. If the set of table pages includes a single page having sufficient free space to accommodate the new record, then the new record is inserted on to the single page. If the set of table pages includes at least two pages having sufficient free space to accommodate the new record, then the new record is inserted on an emptiest page in the at least two pages. If sufficient free space is not found on the set of table pages, then the new page having the new record is appended to the end of the table.