1. Technical Field
This invention generally relates to computer systems and more specifically relates to databases in computer systems.
2. Background Art
Since the dawn of the computer age, computers have evolved and become more and more powerful. In our present day, computers have become indispensable in many fields of human endeavor including engineering design, machine and process control, and information storage and access. One of the primary uses of computers is for information storage and retrieval.
Database systems have been developed that allow a computer to store a large amount of information in a way that allows a user to search for and retrieve specific information in the database. For example, an insurance company may have a database that includes all of its policy holders and their current account information, including payment history, premium amount, policy number, policy type, exclusions to coverage, etc. A database system allows the insurance company to retrieve the account information for a single policy holder among the thousands and perhaps millions of policy holders in its database.
Databases generally contain one or more indexes that make searching the database for information much more efficient than performing a full database search for every query. The performance of a database system is dependent on the performance of a paged memory system that swaps pages from disk to a buffer. If the order of keys in a particular index is close to the physical order of the keys in the database table, the performance of the memory paging system using this index will be improved because many accesses will likely be made to the page buffer without performing page swaps. A statistical measure of the correlation of a column in the database to the corresponding data in physical storage is known as xe2x80x9cclustering factorxe2x80x9d. The clustering factor indicates the degree to which the data in the physical storage is clustered (i.e., close together) in physical storage.
Clustering factor in the prior art is typically computed as a function of the size of a memory page and the size of the page buffer. Making this computation is relatively straightforward when the size of the page buffer is known. The size of a page buffer is generally known for virtual memory systems that specify a virtual size for the buffer. However, some computer platforms, such as the IBM iSeries 400, do not have a virtual memory system that provides a fixed-sized page buffer, but instead have a single-level store. With a single-level store, the address space of the processor must be shared among the operating system and all applications. For this reason, it is impossible to set a fixed size for the page buffer, because the size can vary and even change dynamically as system requirements change. Without an apparatus and method for determining clustering factor in a database that has a variable-sized page buffer, the clustering factor for indexes will be unavailable for some types of computer platforms, making it difficult to optimize database performance based on clustering factor.
According to the preferred embodiments, an apparatus and method perform block-level sampling on a database, process the data to generate one or more matrices, and process the one or more matrices to generate a clustering factor for a selected index. In addition, the apparatus and method of the preferred embodiments allow the distribution of the clustering factor to be determined across a range, thereby allowing the identification of ranges where the clustering factor is high and ranges where the clustering factor is low. The clustering factor distribution can then be used to predict the memory paging performance of a search that uses an existing or a potential index that corresponds to the sampled data, and can therefore be used to predict the performance of searching the database using an existing or potential index for a particular database query.
The foregoing and other features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings.