This invention relates to a method of analysing the manner in which an SQL statement referencing a database was executed.
SQL statements are conventionally executed by a database using a rules-based approach. However, it is becoming increasingly common for them to be executed according to a cost-based approach. For example, the Oracle(copyright) database includes a cost-based optimiser for executing SQL statements according to a cost based approach and an example of how this operates is set out below.
In this example, there is a table called ORDERS with 2 million rows having an indexed column called ORDER_STATUS. The indexing of this column provides the database with fast, direct access to the ORDER_STATUS column. Each of the rows in the table contains one of four values in the ORDER_STATUS column. These are xe2x80x9cCOMPLETEDxe2x80x9d, xe2x80x9cAWAITING STOCKxe2x80x9d, xe2x80x9cAWAITING DISPATCHxe2x80x9d and xe2x80x9cCANCELLEDxe2x80x9d and the percentage of rows containing each value is 95%, 1%, 2% and 2% respectively.
If the SQL statement:
SELECT * FROM ORDERS WHERE ORDER_STATUS=xe2x80x9cCOMPLETEDxe2x80x9d
is issued to the database for execution using a rules-based approach then the database will access the ORDER_STATUS column using its index and will read 3.8 million rows since each row where ORDER_STATUS is xe2x80x9cCOMPLETEDxe2x80x9d will be accessed twice.
If, however, the statement is executed using a cost based approach, the cost-based optimiser will obtain the proportion of rows in the ORDERS table which have an ORDER_STATUS of xe2x80x9cCOMPLETEDxe2x80x9d from a set of statistics relating to the ORDERS table. From this it will determine that nearly all the rows have an ORDER_STATUS of xe2x80x9cCOMPLETEDxe2x80x9d and that the fastest approach to execute this statement is to access every row in the table and discard those rows that do not have an ORDER_STATUS of xe2x80x9cCOMPLETEDxe2x80x9d.
Similarly, if the above SQL statement is modified such that it reads:
SELECT * FROM ORDERS WHERE ORDER_STATUS=xe2x80x9cCANCELLEDxe2x80x9d
then the optimiser will, using the set of statistics, determine that the best approach in this instance is to access the column via its index and then extract all rows with an ORDER_STATUS of xe2x80x9cCANCELLEDxe2x80x9d.
This example shows that execution of SQL statements using a cost-based approach is extremely reliant on the accuracy of the set of statistics used to determine the best execution plan for the statement. Furthermore, each database has its own unique set of statistical data for use by the cost-based optimiser and hence, it is difficult for remote support teams to ascertain why the cost-based optimiser is choosing certain execution plans and to investigate the slow execution of SQL statements relating to poor choice of execution plan by the cost-based optimiser. Presently, collection of statistical data is very time consuming and is not performed in a standard manner.
In accordance with one aspect of the present invention, there is provided a method of collecting data relating to the manner in which a SQL statement referencing a database was executed, the method comprising:
a. obtaining an execution plan and associated performance factors for the SQL statement;
b. obtaining predetermined statistical data relating to the database referenced by the SQL statement; and,
c. generating at least one output file containing the SQL statement, the execution plan, the performance factors and the predetermined statistical data.
Hence, the invention provides a standard way of obtaining the statistical data used by the cost-based optimiser and the execution plan for the statement along with performance factors for the execution plan and providing these to a remote support team such that they can investigate any problems associated with the execution of the SQL statement.
Typically, the execution plan for the SQL statement is obtained by extracting the SQL statement from a trace output file, which contains the SQL statement and a partial execution plan along with information relating to the database accesses performed by the SQL statement, and presenting the SQL statement to the database with a command to produce the execution plan.
It is desirable for database initialisation parameters to be stored in the at least one output file so that the support team can determine whether these have had an effect on the execution of the SQL statement.
In the case of the Oracle(copyright) database, these database initialisation parameters relate to the cost-based optimiser and are used to configure various cost-based optimiser settings.
Typically, the performance factors comprise one or more of cost, cardinality and data size parameters. The cost parameter is a figure of merit used to indicate the quantity of database accesses required to execute the statement. The cardinality parameter is an estimate of the number of rows that the SQL statement is expected to generate and the data size parameter, also called the byte parameter, is the product of the cardinality parameter and the average number of bytes per row and indicates how many bytes will be generated by the SQL statement.
Typically, data relating to a plurality of SQL statements is collected and, in this case, the number of SQL statements for which data is collected may be limited to a predefined value.
Optionally, the predetermined statistical data may be stored in a database table.
As a further option, an SQL script may be generated which enables the predetermined statistical data to be recreated when the script is executed on a second database.
Preferably, such an SQL script is executed on the second database in order to recreate the predetermined statistical data on the second database.
There is also provided a method of analysing the manner in which a SQL statement was executed by executing the SQL statement on the second database after the predetermined statistical data has been recreated on the second database either by executing the SQL script on the second database or by transferring the database table to the second database.
The invention will typically be provided as a computer program comprising computer program code means adapted to perform all the steps of the first aspect of the invention when said program is run on a computer.
Further, there may be provided a computer program product comprising program code means stored on a computer readable medium for performing a method of the first aspect of the invention when said program product is run on the computer.