Field of the Invention
The present invention is directed to automatic physical-database design.
Background Information
One of the primary features of a relational database management system (DBMS) that distinguishes it from its predecessors (network and hierarchical data management systems) is the idea of “data independence.” Data independence separates the logical view of data (the logical layer) from its physical representation (the physical layer).
The benefits of data independence include ease of use (the logical layer acts as a user-friendly abstraction of the physical layer) and adaptivity (modifications at the physical layer, such as adding an index, can be made without breaking preexisting queries or designs at the logical layer).
In contrast to physical database design, with which the present invention is primarily concerned, the primary function of logical database design is to define a database schema in accordance with which the data to be managed are partitioned into a set of tables (or relations) whose attributes (or columns) the schema defines. An entity-relationship diagram is one tool that that assists with this process. A more formal approach involves normalization: an incremental process that first requires creating a “universal relation” that contains all attributes of all data, and then using functional dependencies to drive the gradual decomposition of this table into smaller (i.e., fewer-column) tables, stopping once the set of tables produced satisfies some set of constraints (a normal form). As an example, a set of tables is in Boyce-Codd normal form (BCNF) if every table, T, has the property that every functional dependency,A1→A2 consisting solely of columns found in T has A1 as a key (i.e., if A1 functionally determines A2, it functionally determines all columns in T).
The goal of normalization is to produce a logical design that is complete while guarding against inconsistency errors. A logical design is complete if it is always possible to faithfully reproduce the universal relation from the relations in the decomposition by using a single query. Inconsistency errors are prevented by ensuring that a given attribute appears only in a table whose key functionally determines it.
The present invention is concerned not so much with such logical-design elements as it is with physical database design, which affects how much storage space the data require and how quickly they can be updated and retrieved. The physical-database-design task involves balancing the competing efficiency concerns of data retrieval and data updates. The tasks related to physical database design include:                (a) Clustering: What relations and segments are stored on which sites?        (b) Index Construction: Which tables should be indexed and on what fields?        (c) Materialized Views: What query results should be precomputed and automatically updated as underlying relations get updated?        (d) High-Availability Options: Should crash recovery be enabled by turning on logging, or by maintaining a mirror?        
Although the responsibility for producing a physical database design from a logical design typically falls to a database administrator (DBA), tools have been developed to automate much of that task. One example is IBM's DB2 Design Advisor tool as described in Valentin et al., “DB2 Advisor: An Optimizer Smart Enough to Recommend Its own Indexes,” Proc. ICDE Conf. 2000, pp. 101-110, and in Zilio et al., “DB2 Design Advisor: Integrated Automatic Physical Database Design,” Proc. 30th VLDB Conference, Toronto, Canada, September, 2004, pp. 1087-97. That design tool selects among candidate combinations of indexes, materialized views, partitionings, and multi-dimensional clusterings in accordance with a cost model the same as the one that the DBMS's optimizer uses when it selects among indexes in an existing physical design to implement a query.
Another example is the AutoPart tool, described in Papadomanolakis et al., “AutoPart: Automating Schema Design for Large Scientific Databases Using Data Partitioning,”Proc. 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), Santorini Island, Greece, June 2004, pp. 21-23. That tool deals particularly with partitioning. It uses a set of example queries to identify optimal vertical and horizontal partitionings: certain groups of columns are stored separately from other groups of columns (vertical partitioning), records are in some fashion assigned to categories, and the record fragments in a given vertical partitioning are stored separately in accordance with their categories (horizontal partitioning). Moreover, the AutoPart tool makes partitioning determinations separately for different categories so that the vertical partitioning used for records of one category may differ from other records' vertical partitioning.