Data warehouses are repositories of all or parts of the data that an organization's various business systems collect. Data warehouses can include databases which can be used to maintain a collection of records or other information such that they can be easily accessed, updated and managed. The contents of such databases are many times organized into tables such as is shown in FIG. 1A that organize data by columns and rows.
Accessing information from databases can involve sequential scans over database indexes to tables. Database indexes specify the logical order of table contents. FIG. 1B shows one type of commonly used index called a B+ tree. A B+ tree can be constituted of a plurality of levels of indexing information. FIG. 1B shows a B+ tree that has internal and leaf levels. As shown in FIG. 1B, each level of the B+ tree can correspond to a set of linked pages (ranges of values from the table) which store rows of data in sorted key order. A sequential scan of a range of key values in a B+ index involves reading through a set of logically contiguous pages of data.
At each level of the B+ tree, the key order defines the logical order of the pages through a linkage system. However, the physical order of the pages in a file on disk (the location or offset of the pages in the underlying memory files) may or may not match the logical order of the pages as defined by the B+ tree. For example, although pages 100 and 30 may be logically adjacent and linked to each other based on key order defined in a B+ tree, they may reside at physical locations in memory that are widely separated.
Fragmentation refers to the degree of matching that exists between the physical and the logical orders of adjacent pages. There are two factors that decide fragmentation: (1) the degree of physical and logical order matching that exists between pages, and (2) page contiguity. When the physical order and the logical order of adjacent pages match, or are close, fragmentation is considered to be low. Conversely, where the physical and the logical order of adjacent pages are significantly different, fragmentation is considered to be high. Importantly, fragmentation detrimentally affects sequential index scan performance.
Fragmentation applies to any level in a B+ tree, including both leaf and internal levels. Internal level fragmentation can be consequential to a common performance enhancement feature for index scans in commercial database products called the read-ahead IO optimization which usually involves the scanning of internal nodes that point to leaf data pages.
Data warehouse queries generally involve the reading of large amounts of data (e.g., large numbers of leaf pages). Because of this, the data that is involved in such queries is generally stored on disks. Consequently, reading such data involves IOs that retrieve the data from disks. Such IOs generally involve disk head movements. Such disk head movement accounts for a significant portion of the cost associated with disk IOs. Importantly, the less disk head movement that is involved in an IO, the faster the IO can be completed.
It should be appreciated that if fragmentation is low, the disk head movement involved in executing a query can be minimal. This is because the disk head tends to move in one direction as the number of head movements back and forth over gaps of data not needed in the query is minimal as illustrated in FIG. 1C. Accordingly, where fragmentation is low, disk IOs can be completed quickly. However, when fragmentation is high, the disk head tends to move frequently back and forth over gaps in the process of locating data needed for the query as is illustrated in FIG. 1D. Accordingly, where fragmentation is high disk IOs complete more slowly. Furthermore, IO cost can mainly be attributed to disk head seek time. The less severe the fragmentation level of an index, the less time required for disk head seeks. Accordingly, IO completion time is better, and costs are reduced, when fragmentation is low.
Other benefits of low fragmentation include larger sized IOs. Larger sized IOs involve the reading of larger amounts of data. Larger sized IOs can reduce the number of total reads issued since data that is sought can be retrieved using fewer reads. Such a reduction in the number of total reads issued can reduce kernel overhead as kernel involvement is reduced.
A conventional approach to minimizing fragmentation involves pre-allocating physically contiguous pages to accommodate data contained in an index (both leaf and internal) at index build time. Because the size of an index is readily ascertainable at index build time, a block of contiguous pages that can accommodate data contained in the index can be pre-allocated for the index. However, since subsequent inserts of data are allocated pages on demand, the pages allocated for subsequent inserts can be separated from the originally allocated block of contiguous pages.
Conventional allocation systems receive requests for allocations from different concurrent users related to different indexes. Accordingly, page allocations for data inserts to a particular index typically can be separated by intervening page allocations for data inserts to other indexes. Moreover, page allocations for inserts of data to a particular index made after the index has been built can be from physical locations in the file that are widely separated from that of other pages allocated for the same index. Because such allocations do not maintain the contiguity of pages established when the original block of pages is allocated, index fragmentation is increased.
Conventional approaches such as the one discussed above are unsatisfactory as the detrimental affects of fragmentation on sequential index scan performance not adequately addressed thereby remains significant. In the above discussed approach when data inserts made after the index has been built reaches one percent of the rows that constitute an index, the sequential index scan performance is reduced by 27% due to fragmentation. It should be appreciated that this result is based on a specific test and the degradation of performance can vary for different cases.