The present invention relates generally to viewing the effect of changes to indexing designs for database tables. More particularly, the present invention relates to using virtual tables and virtual indexes for determining optimization plans for database queries when indexes for the database tables are changed.
Achieving and maintaining simple and efficient access to computer data is a goal shared by most computer users. In addition, as the processing power of modern computers increases, greater amounts of data may need to be organized and stored for the user. One system for organizing computer data is the database, which is generally recognized as a group of logically related information objects or files stored together in some recordable medium without unnecessary redundancy. The database preferably serves various applications or programs, and facilitates access by these applications or programs.
In most databases, data is externally structured into tables. Each table generally includes a series of fields which define columns of the table. Each row of the table comprises a single record. For each row of data in a table, a counterpart of that data is physically stored in the database. Thus, when the database user requests particular information from the table, the appropriate portion of the stored data is retrieved and presented to the user.
A program referred to as a xe2x80x9cdatabase management systemxe2x80x9d (xe2x80x9cDBMSxe2x80x9d) provides users with an interface to the database. The DBMS provides structure to the database that enables users to access information objects stored in the database. The DBMS identifies and retrieves certain information objects in response to information requests, or xe2x80x9cqueriesxe2x80x9d from a user. The retrieval of particular information objects depends on the similarity between the information stored in the information objects and requests presented to the system by the user. The similarity is measured by comparing values of certain attributes attached to the information objects and information requests.
For example, if a table named xe2x80x9cEmployeexe2x80x9d contains the fields xe2x80x9cNamexe2x80x9d, xe2x80x9cDeptxe2x80x9d, xe2x80x9cAgexe2x80x9d and xe2x80x9cSalaryxe2x80x9d, and a user desires to find the subset of employees who work in the electronics department, the following query can be used:
SELECT Name, Salary, Age
FROM Employee
WHERE Dept=xe2x80x9cElectronicsxe2x80x9d
To facilitate the retrieval process, information objects in a database are often xe2x80x9cindexedxe2x80x9d so that the objects are characterized by assigning descriptors to identify the content of the objects. The process of characterizing these information objects, referred to as xe2x80x9cindexingxe2x80x9d, can lead the DBMS to particular items in the database in response to specific queries from a user.
To build an index for a table, the DBMS typically scans the table, retrieves the data from every row and column in the table, and adds the data to the index, which is often in the form of a B-tree structure. For more information on B-tree structures, see Patrick O""Neil xe2x80x9cDatabasexe2x80x94Principles, Programming, Performance,xe2x80x9d Morgan Kaufmann Publishers, Inc. (1994), incorporated herein by reference. The DBMS sequentially reads each and every data entry in the table, copies each data entry to a temporary space, sorts the data entries if necessary, and finally creates a data structure for the index.
The process of building the index for the table, however, generally consumes great amounts of time and resources. For example, creating an index for a table having several million rows may take several days. Moreover, creating the index for this table would typically require several hundred megabytes of temporary workspace to copy and sort the data before creating the index. Naturally, the process of creating or changing an index will take proportionally larger amounts of time. Tables for databases such as those used in data warehouses may have billions, or even millions of billions of rows. It may take the user weeks or even months to create the index for tables this size.
Other factors compound the time problems associated with creating the index. Specifically, after the index is created, time is required for the database to test the index and return performance statistics, and for the user to analyze those results.
With a typical indexed database system, there are generally two steps to resolving a query. The first step is to determine which clauses in the query have associated descriptors or index entries in the index, to retrieve those index entries, and preliminarily restrict the set of information objects being considered. The second step generally involves taking the set of information objects from the first step and examining each information object in turn to determine if it satisfies the query.
Structured Query Language (SQL) has evolved into a standard language for database queries or statements. An SQL interface allows users to formulate relational operations on database tables either interactively, in batch files, or embedded in host languages such as C, COBOL, etc. Operators are provided in SQL that allow the user to manipulate the data, wherein each operator operates on one or more tables and produces a new table as a result.
In the process of tuning an SQL statement or query, the user often wishes to know how a change in indexing would affect the performance of that query. As explained above, indices may be added to the database to facilitate the speed at which queries are executed, especially in larger tables where an index may make a substantial improvement in performance. On the other hand, due to the amount of data in the database, adding or changing an index may require considerable amounts of time and resources for the database to build the index. Thus, the user is often presented with the dilemma of either expending the time and resources required to build the index, at the risk of the new index not improving performance in any meaningful way, or not building the index, and risking the failure to recognize improved performance which might be possible with the index.
Oracle database management systems provide users with the ability to view an xe2x80x9coptimization planxe2x80x9d of a SQL statement. An optimization plan is automatically determined by the database for the SQL statement when the statement is parsed by the database. The optimization plan shows how the database would retrieve the data necessary to satisfy the requirements of the SQL statement without actually executing the SQL statement. Specifically, the optimization plan shows, among other things, information such as what table would be first accessed, how intermediate result sets would be joined, whether an index would be used and, if so, how that index would be interpreted. Thus, by viewing the optimization plan for a particular SQL statement, the user may obtain an estimate as to how efficiently that SQL statement would be executed in the database.
In larger database management systems, the optimization of queries becomes more important to minimize the amount of time and resources consumed. Thus, it becomes equally important for users to be able to view the optimization plan for an SQL statement and ascertain the effect index changes may have on the optimization plan.
FIG. 1 is a flow diagram of a conventional method 100 for viewing the effect of changes to an index for a database table on an optimization plan for an SQL statement. In step 110, an original optimization plan is created for the SQL statement. In step 120, the indices for a table referenced in the SQL statement are created, dropped or modified. In step 130, a new optimization plan is created for the SQL statement. Lastly, in step 140, the user compares the new optimization plan with the original optimization plan to determine if performance would be improved or worsened as a result of the index changes.
As explained above, however, conventional method 100 of FIG. 1 requires excessive time and resources to create, drop or change the index. Moreover, excessive time and resources are then required for the database to gather the necessary statistics to build the various optimization plans. If the database is not used in a production environment, taking the time to make changes to the index using conventional methods might be possible. If the database is used in production, however, it would most likely not be feasible to expend the time and energy needed make the changes shown in FIG. 1 due to the tremendous negative impact on speed, resources and overall performance. For example, using the methodology of FIG. 1, if a tool or application were relying upon an existing index, and the user changed or dropped that index in step 120, the database could shut down and the entire system could deadlock.
Thus, with conventional methods for previewing the effect of index changes on optimization plans, the user is often compelled to minimize any experimentation with the index. This, in turn, often results in the failure to realize the optimal index topography or optimization plan for the database which could cost great amounts of time and energy when the SQL statements are executed in the database, particularly for larger database management systems. Thus, there is a need for a faster and more efficient way to change indexing designs for database tables and to create optimization plans for these indexes.
The present invention allows a user to see how an optimization plan for a database query changes when a new index is added to a database table, an existing index is dropped from the table, or an existing index for the table is modified.
A method and apparatus provide a framework for a user to experiment with the index topography for tables and preview the effects that the various topographical constructions of indexes can have on the optimization plan for a database query, such as an SQL statement, without having to dedicate the time and resources required by conventional methods.
According to aspects of the present invention, a virtual table is created which mimics the structure of a table on the database under test, or xe2x80x9coriginal tablexe2x80x9d on the database. The virtual table is generally created by copying the original table, excluding any data in the original table. Thus, for example, if the data is stored in rows of the original table, the rows are not copied into the virtual table. Any existing indexes associated with original table, or xe2x80x9coriginal indexes,xe2x80x9d however, are copied to define a virtual index associated with the virtual table.
By excluding data when copying the original table to define the virtual table, the associated virtual index may be easily and quickly modified while preserving the overall structure of the original table. New indices may be added and existing indices may be dropped very quickly. Also, if no original index exists, a new virtual index may be easily created.
In the query, references to the original table are replaced with references to the virtual table. The database management system then determines a new optimization plan for the query. Because the new optimization plan is determined using the virtual table and virtual index, the plan is retrieved much faster than if it were created using the original table and any associated original index. This is because the actual data in the original table was excluded when the original table was copied to define the virtual table. Thus, any changes to the optimization plan may be identified quickly after the indexing design is altered.
Before the new optimization plan is displayed for the user, any references in the new optimization plan to the virtual table and any virtual index are replaced with the names of the original table and the original index. In this way, the user can compare the new optimization plan with the original optimization plan and analyze the changes without concerning himself with, or even needing to know about, the use of virtual objects in creating the new optimization plan.