The present invention relates to database management systems and more particularly to a method for a system maintained materialized functionally dependent generated column for a relational database management system (DBMS).
A database management system (DBMS) comprises the combination of an appropriate computer, direct access storage devices (DASD) or disk drives, and database management software. A relational database management system is a DBMS which uses relational techniques for storing and retrieving information. The relational database management system or DBMS comprises computerized information storage and retrieval systems in which data is stored on disk drives or DASD for semi-permanent storage. The data is stored in the form of tables which comprise rows and columns. Each row or tuple has one or more columns.
The DBMS is designed to accept commands to store, retrieve, and delete data. One widely used and well known set of commands is based on the Structured Query Language or SQL. The term query refers to a set of commands in SQL for retrieving data from the DBMS. The definitions of SQL provide that a DBMS should respond to a particular query with a particular set of data given a specified database content. SQL however does not specify the actual method to find the requested information in the tables on the disk drives. There are many ways in which a query can be processed and each consumes a different amount of processor and input/output access time. The method in which the query is processed, i.e. query execution plan, affects the overall time for retrieving the data. The time taken to retrieve data can be critical to the operation of the database. It is therefore important to select a method for finding the data requested in a query which minimizes the computer and disk access time, and therefore, optimizing the cost of doing the query.
A database system user retrieves data from the database by entering requests or queries into the database. The DBMS interprets the user""s query and then determines how best to go about retrieving the requested data. In order to achieve this, the DBMS has a component called the query optimizer. The DBMS uses the query optimizer to analyze how to best conduct the user""s query of the database with optimum speed in accessing the database being the primary factor. The query optimizer takes the query and generates a query execution plan. The query execution plan comprises a translation of the user""s SQL commands in terms of the DBMS operators. There may be several alternative query execution plans generated by the query optimizer, each specifying a set of operations to be executed by the DBMS. The many query execution plans generated for a single query ultimately differ in their total cost of obtaining the desired data. The query optimizer then evaluates these cost estimates for each query execution plan in order to determine which plan has the lowest execution cost. In order to determine a query execution plan with the lowest execution cost, the query optimizer uses specific combinations of operations to collect and retrieve the desired data. When a query execution plan is finally selected and executed, the data requested by the user is retrieved according to that specific query execution plan however manipulated or rearranged.
In a SQL based DBMS the query execution plan comprises a set of primitive operations or commands, e.g. JOIN; a sequence in which the retrieve operations will be executed, e.g. JOIN ORDER; a specific method for performing the operation, e.g. SORT-MERGE JOIN; or an access method to obtain records from the base relations, e.g. INDEX SCAN. In most database systems, particularly large institutional systems, a cost-based query optimizer will be utilized. A cost-based query optimizer uses estimates of I/O and CPU resource consumption in determining the most efficient query execution plan because both I/O and CPU resource consumption depend on the number of rows that need to be processed.
The performance of queries against a database may be enhanced significantly by materializing certain data that may be redundant of data already in the database. This materialized data may be organized in ways better suited to certain database operations, such as searching for specific data, for example as with indexes, or may pre-compute information likely to be asked for often, as with materialized views, for example.
A materialized generated column in the DB2(trademark) software product is a column of a table which is functionally dependent on other columns in the same row. In general it is not advisable to have functionally dependent columns in a table since there is a risk to the integrity of the database if the functionally dependent columns are not maintained properly. However in the art, functionally dependent columns have advantages in certain situations, including the following: (a) The expression for deriving the value of the generated column is expensive and therefore causes a performance hit for all queries requiring the result if evaluated at query time. For example, the source column is a large object and the expression extracts meta-information of that object such as the minimum bounding rectangle of a complex shape. (b) In another situation, the column value may be needed for indexing or partitioning of the table. For example, the source column is a customer name, but an index is needed on the case-insensitive version of the name.
The conventional way for maintaining such a materialized generated column involves creating two triggers and a check constraint. One xe2x80x9cbefore-each-row-insertxe2x80x9d trigger is utilized to populate the generated column when a new row is inserted. Another xe2x80x9cbefore-each-row-updatexe2x80x9d trigger is utilized to maintain the column whenever a value on which the generated column depends on is changed. The check constraint is used to ensure the integrity of the generated column even when triggers are not fired, for example, when there is a bulk load of new data. The check constraint is also important to allow the DBMS to exploit the knowledge of the functional dependency. For example, when a query uses the same expression that is used to define the additional column, the expression car be re-routed to that column. This allows existing applications to benefit from the column without knowing about its existence.
Creating a generated column using conventional SQL techniques has several drawbacks. First, the procedure is awkward and requires the creation of multiple supporting objects. Secondly, for an UPDATE or DELETE operation, the generating expression has to be evaluated twice. One evaluation is for the trigger and the other evaluation is to ensure the integrity via the check constraint. Thirdly, triggers fire in a certain order. In order to ensure the integrity of the generated column, the trigger maintaining the generated column has to be fired last at all times.
Accordingly, there remains a need for a mechanism which provides the advantages of functionally dependent columns without the drawbacks associated with prior approaches.
The present invention is directed to a method for providing a functionally dependent column or materialized generated column for a relational database management system (RDBMS) which is maintained by the system. The generated column is a column in a table of an RBDMS which is functionally dependent on other columns in the same table. The integrity of the generated column is maintained and ensured by the DBMS transparently to the user.
According to one aspect of the invention, whenever a column affecting a generated column is updated a before-update trigger is internally generated and added as the last trigger before the update occurs. Whenever an insert operation is performed, a before insert trigger is internally generated and added as the last trigger before the insert occurs. This feature makes it possible to ensure integrity without checking the value and also enables the optimizer to take advantage of the generated column expression.
According to another aspect, a syntax to define the column and generate its expressions is provided. This provides greater flexibility in the application of the materialized generated column, and also enables the DBMS to optimize the maintenance of the generated column which results in better performance.
In a first aspect, the present invention provides a method for maintaining a materialized column in a database management system, the materialized column s dependent on one or more columns appearing in one or more tables stored in the database management system, each of the tables is organized as a plurality of rows and columns, and the database management system provides a preprocessing module for use in the processing of data for the table, the method comprises the steps of: (a) generating an internal object in response to one of the rows in the table being modified, wherein the row includes one or more of the dependent columns; (b) executing the internally generated object in the preprocessing module after all preprocessing operations for manipulating the columns upon which the generated object depends have seen performed.
In another aspect, the present invention provides a computer program product for use on a computer for maintaining a materialized column in a database management system, the database management system including a processing unit and physical storage for storing data, the physical storage being operatively connected to the processing unit, the physical storage containing one or more tables, each database table being organized as a plurality of rows and columns, the materialized column is dependent on one or more columns in the table, and wherein commands are entered by a user for inserting and modifying data in the database management system, the computer program product comprises: a recording medium; means recorded on the medium for instructing the processing unit to perform the steps of, (a) generating an internal object in response to one of the rows in the table being modified, wherein the row includes one or more of the dependent columns; (b) executing the internally generated object in a preprocessing module after all preprocessing operations for manipulating the columns upon which the generated object depends have been performed.
In yet another aspect, the present invention provides a computer program product for use on a computer for maintaining a materialized column in a database management system, the database management system includes a processing unit and physical storage for storing data, the physical storage is operatively connected to the processing unit, the physical storage contains one or more tables, each database table is organized as a plurality of rows and columns, the materialized column is dependent on one or more columns in the table, and wherein commands are entered by a user for inserting and modifying data in the database management system, the computer program product comprises: a recording medium; means recorded on the medium for instructing the processing unit to perform the steps of, (a) generating an internal object in response to one of the rows in the table being modified, wherein the row includes one or more of the dependent columns; (b) if the modification to the row comprises an update operation, the internally generated object is included in a before-trigger list as a before-trigger, and the internally generated object is added as the last before-trigger to the before-trigger list; and (c) processing the internally generated before-trigger before the row is modified.