The invention relates to a method to help in optimizing a query from a relational data base management system, more currently known by its acronym RDBMS. More particularly, it applies to the administration of the RDBMS. It also relates to a method of syntactical analysis resulting therefrom.
A data base may be defined as an integrated set of data that model a given universe. The data used by various applications are grouped in the same base in such a way as to avoid the problems of data duplication. Associated with a data base is a conceptual schema that describes the structure and type of the data that it contains and the constraints, which must always be verified. The data base administrator has the task of making sure of the verification of these constraints.
The data administration has three roles. First, it defines the data base users by assigning them a name, a password, and a set of privileges or rights for access to the data. It also defines the administrator or administrators, who are the persons authorized to handle all or part of the administration of the data base. The second role of the administration is to assign to the administrator, or cause him to participate in, the definition of the conceptual schema of the data base. The administrator thus defines the schemas of the various tables and the rules relating to these tables. In his final role, the administrator has possibility of benefitting, in terms of performance, integrity and security, from the conceptual schema of the base and of the RDBMS. It is this third role upon which the subject of the present invention essentially bears.
When an RDBMS application is being developed, the role of an administrator is to optimize the use of the RDBMS's resources. When a query is made at the time of the application, the RDBMS chooses an access route for executing the query. To optimize the performance of the application, the administrator must analyze the queries sent and decide on the action to be taken. His action may be to change the form of the query, and/or to create or delete indexes in the tables concerned. His decision depends on the information that indicates to him whether or not the current form of the query is adequate, or whether the existing indexes are pertinent.
The indexes are physically independent of the data and may thus be deleted and created at any moment. An index is used at the time of access to the data, regardless of who created it. A table may have several indexes at the same time. A compromise must be made between the number of indexes and the overhead involved in updating them. An index may apply to more than one column, and in that case it can assure the uniqueness of the columns used. It also improves the performance of the queries for which the selection condition will affect more than one column. A query with more than one predicate can use multiple indexes, if they are unique, if the predicates are equalities and are defined in the same table. In that case, the partial results from each index are merged to constitute the definitive result. Some RDBMS are capable of using the properties of indexes and the characteristics of columns to decide which index to take and which to avoid for the merge operation. Whether or not to use an index is decided by the RDBMS optimizer. The decision varies depending on the type of RDBMS.
On the other hand, the schema of an RDBMS may evolve dynamically. For example, one or more fields may be added to a table, and indexes may be created in a table. In this context, it may happen that the RDBMS does not encounter an index that it must use, or encounters an index that it must not use, for example if access is being gained to more than 10% of the articles. In this latter case, the index becomes a hindrance. Given all the possibilities in choice of the RDBMS to set up its plan, it is estimated that the utilization of the resources can vary within a ratio of 1:10,000, depending on the formula of the query and the choice of indexes.
At present, optimizing a query is complicated. First, it is not automatic, and specific RDBMS tools must be used to request the plan of execution of the query. This plan is the concatenation of operations to be executed by the data base server in order to process the query. To request this plan, the user must write the query in the language of the RDBMS. Hence, writing the query requires good knowledge of the query language of the RDBMS. Moreover, it is rigorous and arduous work and involves a great loss of time, aggravated by the risks of mistakes that ensue. In response, the RDBMS furnishes the results of the request of the execution plan of the query, in a variable form depending on the RDBMS. The plan is written in the query language of the RDBMS and is currently present in the form of a long table with multiple columns, for example. The table is written in linear mode and appears on the screen only one page at a time. Analyzing the table is accordingly highly complicated, and the user ordinarily makes his task easier by requesting a printout of the table on paper. This means that he must go through all the printing processes specific to that terminal. The printout is done page by page, with column headings repeated each page. Consequently, in the coded language, the administrator must decrypt the hierarchy of processing the query and reconstitute the plan by analyzing numerous columns and a large number of disordered lines on various pages. This makes the search long and difficult. Moreover, the administrator does not have a global view of the plan available, which would ordinarily be too long to obtain and would be overly bothersome in proportion to the advantage he would gain from it. Nevertheless, the administrator must overcome this disadvantage as well as he can, in order to decide how to optimize the query. For example, if he modifies the query, he must rewrite the modified query, analyze the new table, and compare it with the old table. The tools used to create an analysis table do not save the preceding table in memory, and hence it must be printed out in order to be able to refer to it to make the comparison with the new table. In practice, the comparison accordingly consists of gathering the long, difficult analyses of the two tables. Finally, the administrator lacks a visual display of information about the objects used in the processing. It will be appreciated that under these conditions, optimizing a query is difficult to achieve rapidly, even for a skilled administrator.
One approach has been proposed to make the visual display and interpretation of the plan of execution of a query easier, so that the processing of the query can be optimized rapidly, at less cost. This approach had the dual advantage of offering graphical representation of the execution plan of the query and of being independent of the optimizer and of the RDBMS query language. In a preferred variant embodiment of this approach, this optimizing help could even be achieved by any user who does not know the query language of the RDBMS. This approach is described in the copending French patent application of the present applicant, filed on Oct. 12, 1992 and having the serial number 92 12148 and corresponding U.S. application of Eric Vachey, U.S. Ser. No. 08/128,159 (BULL 3360) filed concurrently herewith.
This copending patent application proposes a method to help in optimizing a query of an RDBMS, including the search for the plan of execution of the query written in the query language of the RDBMS, characterized in that it consists of constructing a tree, on the basis of this search, that represents the plan of execution of the query, and representing the tree on a screen.
This tree appears in the form boxes connected to one another by their links of affiliation. Each of the boxes of the tree has a heading giving the name of the operation concerned and a subheading giving its possible option. Advantageously, a possible characteristic of the operation is added here, such as the name of the object, and optionally the name of the owner of the object to which the operation applies. An improvement consists of writing this characteristic in a button of the box and clicking on the button to request additional information about the object relating to the button. For example, if the operation being performed on the object that is clicked on is a table, then the additional information includes elements that describe the table. On the other hand, if the operation on the object clicked on is an index, then the additional information includes a histogram.
In summary, the data furnished by the tree for execution of a query may be all the data contained in the execution plan written in the query language of the RDBMS. Thanks to the invention, the user can also obtain additional information about some of these data. It will be appreciated that this tree furnishes the user with a powerful tool to help in optimizing a query.
Research into increasing the power has been done. One major problem presents a stumbling block, as will be illustrated by a very simple example to facilitate comprehension. Let it be assumed that the RDBMS working in SQL is handling automobile shipping and contains a table known as Type.sub.-- Car. For example, in its columns this table includes the type of vehicle, model name "Mark", the price "Price", and the power "Power". Let it now be assumed that the user of this table requests the vehicle types having the model name XXX. The user will then write the following query in the SQL language:
SELECT Type PA1 FROM Type.sub.-- Car PA1 WHERE Mark=`XXX`
The table representing the execution plan of this query accordingly takes into account only the objects of the query, that is, Type.sub.-- Car. In the query, the other elements relating to the WHERE clause indicate to the RDBMS only that it should refer to the column "Mark" and there select the model XXX and the elements relating to the SELECT clause indicating to the RDBMS that it should return only the "Type" information in the result. However, it is clear that these elements are not taken into account in the table representing the execution plan of the query. In other words, this table loses the other elements of the request, that is, Type, Mark and the notation XXX in the query illustrated.
Since the execution tree of the query is constructed from the table representing the plan of execution of this query, the elements lost from this table cannot appear in the tree. It is consequently impossible to incorporate these lost elements into the tree by using the prior art method for help in optimizing the query. This made it impossible to improve the help provided.
Yet incorporating these lacking elements into the boxes of the tree would offer the user numerous advantages. First, the notation Mark=`XXX` would have the advantage of being an image of the query. The table representing the execution plan of the query would include not only the results of the search for the execution plan of the query, but also the condition that led to those results. It must be understood that the above example has intentionally been simplified to the maximum, and that in practice a query is much more complicated. Inserting all the elements of the query would procure for the user a real image of this query, in addition to its result, and would make him better able to understand their relationship. This appreciation of their relationship would have the major advantage of limiting the choice of modifications to be made for optimizing the query, and of better formulating the modifications. In other words, a user who is a nonspecialist in the query language of the RDBMS would be provided with fast, easy help in attaining the best possible optimization of query execution, because incorporating the lacking elements provides a highly valuable guide in achieving this help.