Achieving and maintaining simple and efficient access to computer data is a goal shared by most computer users. In addition, as the processing power of modern computers increases, greater amounts of data may need to be organized and stored for the user. One system for organizing computer data is the database, which is generally recognized as a group of logically related information objects or files stored together in some recordable medium without unnecessary redundancy. The database preferably serves various applications or programs, and facilitates access by these applications or programs.
In most databases, data is externally structured into tables. Each table generally includes a series of fields which define columns of the table. Each row of the table comprises a single record. For each row of data in a table, a counterpart of that data is physically stored in the database. Thus, when the database user requests particular information from the table, the appropriate portion of the stored data is retrieved and presented to the user.
A program referred to as a "database management system" ("DBMS") provides users with an interface to the database. The DBMS provides structure to the database that enables users to access information objects stored in the database. The DBMS identifies and retrieves certain information objects in response to information requests, or "queries" from a user. The retrieval of particular information objects depends on the similarity between the information stored in the information objects and requests presented to the system by the user. The similarity is measured by comparing values of certain attributes attached to the information objects and information requests.
For example, if a table named "Employee" contains the fields "Name", "Dept", "Age" and "Salary", and a user desires to find the subset of employees who work in the electronics department, the following query can be used:
SELECT Name, Salary, Age PA1 FROM Employee PA1 WHERE Dept="Electronics"
To facilitate the retrieval process, information objects in a database are often "indexed" so that the objects are characterized by assigning descriptors to identify the content of the objects. The process of characterizing these information objects, referred to as "indexing", can lead the DBMS to particular items in the database in response to specific queries from a user.
To build an index for a table, the DBMS typically scans the table, retrieves the data from every row and column in the table, and adds the data to the index, which is often in the form of a B-tree structure. For more information on B-tree structures, see Patrick O'Neil "Database--Principles, Programming, Performance," Morgan Kaufmann Publishers, Inc. (1994), incorporated herein by reference. The DBMS sequentially reads each and every data entry in the table, copies each data entry to a temporary space, sorts the data entries if necessary, and finally creates a data structure for the index.
The process of building the index for the table, however, generally consumes great amounts of time and resources. For example, creating an index for a table having several million rows may take several days. Moreover, creating the index for this table would typically require several hundred megabytes of temporary workspace to copy and sort the data before creating the index. Naturally, the process of creating or changing an index will take proportionally larger amounts of time. Tables for databases such as those used in data warehouses may have billions, or even millions of billions of rows. It may take the user weeks or even months to create the index for tables this size.
Other factors compound the time problems associated with creating the index. Specifically, after the index is created, time is required for the database to test the index and return performance statistics, and for the user to analyze those results.
With a typical indexed database system, there are generally two steps to resolving a query. The first step is to determine which clauses in the query have associated descriptors or index entries in the index, to retrieve those index entries, and preliminarily restrict the set of information objects being considered. The second step generally involves taking the set of information objects from the first step and examining each information object in turn to determine if it satisfies the query.
Structured Query Language (SQL) has evolved into a standard language for database queries or statements. An SQL interface allows users to formulate relational operations on database tables either interactively, in batch files, or embedded in host languages such as C, COBOL, etc. Operators are provided in SQL that allow the user to manipulate the data, wherein each operator operates on one or more tables and produces a new table as a result.
In the process of tuning an SQL statement or query, the user often wishes to know how a change in indexing would affect the performance of that query. As explained above, indices may be added to the database to facilitate the speed at which queries are executed, especially in larger tables where an index may make a substantial improvement in performance. On the other hand, due to the amount of data in the database, adding or changing an index may require considerable amounts of time and resources for the database to build the index. Thus, the user is often presented with the dilemma of either expending the time and resources required to build the index, at the risk of the new index not improving performance in any meaningful way, or not building the index, and risking the failure to recognize improved performance which might be possible with the index.
Oracle database management systems provide users with the ability to view an "optimization plan" of a SQL statement. An optimization plan is automatically determined by the database for the SQL statement when the statement is parsed by the database. The optimization plan shows how the database would retrieve the data necessary to satisfy the requirements of the SQL statement without actually executing the SQL statement. Specifically, the optimization plan shows, among other things, information such as what table would be first accessed, how intermediate result sets would be joined, whether an index would be used and, if so, how that index would be interpreted. Thus, by viewing the optimization plan for a particular SQL statement, the user may obtain an estimate as to how efficiently that SQL statement would be executed in the database.
In larger database management systems, the optimization of queries becomes more important to minimize the amount of time and resources consumed. Thus, it becomes equally important for users to be able to view the optimization plan for an SQL statement and ascertain the effect index changes may have on the optimization plan.
FIG. 1 is a flow diagram of a conventional method 100 for viewing the effect of changes to an index for a database table on an optimization plan for an SQL statement. In step 110, an original optimization plan is created for the SQL statement. In step 120, the indices for a table referenced in the SQL statement are created, dropped or modified. In step 130, a new optimization plan is created for the SQL statement. Lastly, in step 140, the user compares the new optimization plan with the original optimization plan to determine if performance would be improved or worsened as a result of the index changes.
As explained above, however, conventional method 100 of FIG. 1 requires excessive time and resources to create, drop or change the index. Moreover, excessive time and resources are then required for the database to gather the necessary statistics to build the various optimization plans. If the database is not used in a production environment, taking the time to make changes to the index using conventional methods might be possible. If the database is used in production, however, it would most likely not be feasible to expend the time and energy needed make the changes shown in FIG. 1 due to the tremendous negative impact on speed, resources and overall performance. For example, using the methodology of FIG. 1, if a tool or application were relying upon an existing index, and the user changed or dropped that index in step 120, the database could shut down and the entire system could deadlock.
Thus, with conventional methods for previewing the effect of index changes on optimization plans, the user is often compelled to minimize any experimentation with the index. This, in turn, often results in the failure to realize the optimal index topography or optimization plan for the database which could cost great amounts of time and energy when the SQL statements are executed in the database, particularly for larger database management systems. Thus, there is a need for a faster and more efficient way to change indexing designs for database tables and to create optimization plans for these indexes.