1.0 Field of the Invention
This invention relates to database management systems; and in particular, this invention relates to index page compression in a database management system.
2.0 Description of the Related Art
Database management systems allow large volumes of data to be stored and accessed efficiently and conveniently in a computer system. In various database management systems, data is stored in database tables which organize the data into rows and columns. FIG. 1 illustrates a table 30 and index 31 of a database. In the table 30, a row 32, 34 has one or more columns 36, 38 and 40. In a relational database management system, tables may be associated with each other. The data in one column of a table may be used to refer to data in another table. The term “record” is also used to refer to a row of data in a table.
The index 31 can be used to quickly access the data. The index provides reference to the rows in the table 30. Each row 32, 34 of the table 30 is associated with a row identifier (rid) 42, 44, respectively. A user typically defines a key which comprises one or more columns of the table, and an index is generated based on sorting the rows in accordance with the value in the column(s) which form the key. Typically, a key comprises less than all the columns of the table. The sorted keys 46 with their associated rids 48 are stored in the index 31. In response to a query on a table having an index, the database management system accesses the index to find the record(s) which satisfy the query. In particular, the database management system accesses the index based on the key(s) which satisfy the query to retrieve the associated rid(s) which are used to retrieve the desired data from the rows.
The index is typically created as one or more index pages in volatile memory, such as semiconductor memory, and stored in persistent storage, such as a disk. In the persistent storage, the index is stored in one or more physical pages. The index may also be retrieved from persistent storage. In the volatile memory, the index is stored in one or more index pages; and each index page corresponds to a physical page in persistent storage. The size of the physical page is typically predetermined and fixed. Storing an index on a disk may consume a large amount of space on the disk. Hence there is a need to reduce the amount of space used by an index on a disk. Therefore it would be desirable to use index compression in order to allow an index page to fit on a physical page which is smaller than the index page.
As keys and/or rids are added to any given index page, the overhead associated with ensuring that the index page can be compressed to fit in the physical page can be significant. Ensuring that the data on an index page can be compressed to the smaller fixed page size can incur significant overhead and degrade performance. Performing a compressibility check every time index data is to be added to or updated on the index page can degrade performance. Therefore there is a need for a technique to avoid performing a compressibility check on every addition to or update of an index page.
In addition, because rows, and therefore keys, can be arbitrarily deleted and inserted in an index, “holes” may occur in an index page, and a free space chain is used to keep track of the holes. The holes and the free space chain consume space that could otherwise be used to store additional index information. Therefore, there is also a need for a technique which eliminates holes on an index page.