In a relational database system, many individual records of data are stored in tables. Each table identifies fields, or columns, and individual records are stored as rows, with a data entry in each column. For example, in an employment database, there may be a table EMPLOYEES which comprises fields, or columns, such as employee ID, last name, first name, address, city, state and so forth. One record, including data in the several columns, would be entered in the EMPLOYEE table for each employee. Similarly, other tables would be established in accordance with the logical schema of the database. For example, a table SALARY HISTORY may include columns of salary history information, with each record being identified by an employee ID. Another table MANAGERS may include columns identifying information specific to managers and may also include the employee ID field. Relations between tables can be established by reference to the columns which appear in multiple tables, such as employee ID in the above example.
A major task in database management systems is in locating specific records within individual tables based on the specifics of one or more columns which are used as keys in the search process. To that end, the system may scan through every entry in a particular table to locate the desired records. However, the table may span multiple pages and scanning an entire table might require multiple time consuming input/output (I/O) operations.
To reduce the time required to locate particular records, indexes may be established. In a sorted index, for example, all entries in a column of a table are sorted and listed as keys in ascending or descending order within an index. The index provides a pointer from each entry in the index to a corresponding record in the table. Because the data in the index is sorted, a desired key can be located quickly, and direct access to the desired record is then obtained. Moreover, the size of an index record is generally less than that of its table; thus it is cheaper to scan indexes than tables. Another form of indexing is a hash index. In that case, a system hash function is applied to column data which serves as the key to the index in order to locate a page of the index. Within that page, the key is located to obtain a pointer to the record.
Indexes have the advantage of providing more direct access to individual records and thus of reducing search time, including costly I/O operations, to locate the records. However, the indexes come with the cost of added storage space and an increase in time required to insert or delete records or update records. Whenever a record is inserted or deleted, the system must not only process the record in the table but also any indexes which include columns of the table. Further, updating a record can require relocation of the corresponding entry in one or more indexes.
A user defines the logical design of a database, including the tables and columns. The user also identifies a workload of requests to the database for which the database system locates and operates on data in the tables. In a physical design of the database, the user would generally also establish any indexes which the user would expect to improve performance of the database system. A typical database management system includes an optimizer which, with any request during execution, selects the optimum predefined indexes, if any, to locate particular records.
Other features of the physical design of a database also affect performance. By properly placing records within tables, clustering tables and indexes within files, sizing and locating files, and selecting buffers, the number of input/output operations during processing can be minimized.
A prior physical design system is presented in U.S. patent application Ser. No. 485,376, abandoned, filed by Michael E. Gioielli et al. and Ser. No. 485,372, abandoned, filed by Philip K. Royal, both filed on Feb. 26, 1990 and assigned to the assignee of this invention.