The present invention is related generally to computers and database software. More specifically, the present invention is related to software for sizing databases in a relational database management systems (RDBMSs). The present invention includes methods and systems for determining the required size for database storage including both data and B-tree indexes.
Relational databases came into common use in computers over twenty years ago. Despite improvements in database software and new methodologies, relational databases remain the mainstay of database management systems. Hardware vendors originally supported proprietary database management systems which ran primarily on machines manufactured by the hardware vendor. Software developers later developed database management systems that were more open and ran on computers made by several vendors. The database management systems were also ported to run under various operating systems. This gave the advantage of spreading the cost of development over more sites and also uncoupled the dependence between hardware vendors and software vendors. Third party support and training also became more common.
Database management systems also became separated into client side software and server side software. This meant that the server side software was decoupled from software having to do with the display, use, and formatting of the data received from the database. In particular, server side software often handled mostly queries of existing data along with updates of existing data and insertion of new data.
Modem electronic commerce such as commerce over the Internet or business-to-business electronic commerce has placed increased demands on many servers. This has also made frequent upgrades necessary. Company mergers and acquisitions frequently make it necessary to incorporate large amounts of data from unexpected sources. Customer expectations also make it necessary to upgrade hardware to keep up with the faster response times users expect even though system loads may be increasing as well.
When upgrading or replacing database servers it is necessary to have a good idea as to the size of the database that will be implemented on the new server. The data storage as well as storage of many different indexes will all increase the amount of data required. It may be necessary to come up with a good estimate of the required amount of mass storage in a short time period, as during bid evaluations, during sales presentations, or repeatedly during scenario building. The person supplying the input may have only a rough idea as to the size of the database mass storage requirements.
What would be desirable, therefore, are methods for calculating the data storage requirements for a relational database table and its indexes without requiring interatively calculating the number of index levels, the number of data pages, and the number of index pages.
The present invention includes methods for calculating the mass storage requirements for a relational data base table based on the number of rows, columns, column widths, page size, and the fractional utilization allowed for each page. The total data storage requirement can be calculated by multiplying the number of pages required by the bytes per page. The bytes per page is given by the database vendor and the number of pages required for data can be calculated by dividing the number of data records per table by the number of data records per data page. The number of data records per table can be given by the user and the number of data records possible per data page can be calculated by dividing the amount of data space available per data page by the record size in the table. The term record size refers to the mass storage space reserved for the record rather than the exact size required for each populated record. The amount of data space available per data page can be provided by the data base vendor and is often the maximum size of the page reduced by various factors. The record size can be calculated based on the column widths, number of columns, and for some RDBMS, variability in column widths.
The number of mass storage bytes required for storing the indexes is a function of the number of index pages and the available space per index page for storing index data. The type of calculations vary depending on whether the B-trees have physical order indexes or non-physical order indexes. The height of a B-tree is the number of index pages that are accessed before reading the first data page; an alternative definition includes accessing the first data page in calculating the height.
In the case of physical order indexes, the tree height is a function of the number of data pages, and the average number of index records per page of index records. The number of data pages can be determined as previously discussed. The average number of index records per page of index records can be calculated as a function of the space available per index page to store index records, and the index record size. The index record size refers to the mass storage space required for an index record. The space available per index page for index records can be calculated as a function of the index record page size reduced by factors such as fill factors and number of index records reserved.
In the case of non-physical order indexes, the tree height is a function of the number of leaf pages, and the average number of index records per non-leaf page of index records. Each record in a leaf page contains the index value of the data record and either a pointer to the record in mass storage or a physical order index value depending on the RDBMS. There are a many leaf records as there are data records. The average number of index records per non-leaf page of index records can be calculated as a function of the available space in a non-leaf index page to hold index records and the index record size. The number of leaf pages can be calculated as a function of the number of data records in the table and the average number of index records per leaf page of index records, as previously discussed
Thus, the present invention provides methods for calculating the mass storage requirements for RDBMS tables including both the data requirements and the B-tree index requirements without requiring iterations, counting, or a traversal of the tree to be sized.