A database management system is a suite of programs that manages large structured sets of persistent data. A database management system (DBMS) typicdly controls the organization, storage and retrieval of data, organized in fields, records and files, in a database. The DBMS typically accepts requests for data from users and programs and returns the requested data. A request for data from a DBMS is called a query. Many database systems require a request for information to be made in the form of a stylized query written in a special query language. For example, the query:
SELECT * FROM employees WHERE age>30 AND name=“Smith” requests all records in which the name field is “Smith” and the age field has a value greater than 30 and is written in a common query language, SQL (Structured Query Language).
Most query languages are declarative, meaning, the user specifies what data is wanted and a query optimizer decides the best way to access and process the data. For a given database query, there may be many different ways to process a given query correctly. For example, suppose the query is:
SELECT name, age, salaryFROM employeesWHERE age > 30 AND city = ‘Philadelphia’ AND salary < 100,000
Even though this query is very simple and references only one table, so no choices of join order or join method are available, there are still many different ways in which correct results could be returned. The DBMS could, for example, scan each row in the table and apply each predicate (each filtering condition in the WHERE clause that is joined by AND is a predicate) or, if appropriate indexes exist, one or more indexes could be exploited to access only the rows satisfying one or more of the predicates. For example, the presence of an index on age could limit access to only those rows satisfying the condition age>30 before applying the other predicates. Alternatively, the presence of an index on city could limit access to only those rows satisfying the condition city=‘Philadelphia’ before applying the predicates age>30 and salary<100,000. If an index on multiple columns, such as a combined index on age and city existed, more options would exist.
For complex queries involving many predicates and/or operations and when there are multiple tables accessed, the number of alternative strategies increases exponentially, making the selection of an optimal query plan an even more daunting task. For a two-table join with a handful of predicates, the optimizer may consider over a hundred different plans; for six tables, the number of plans considered could be well over a thousand.
The most efficient query strategy for the database depends on a number of things including availability of the indexes and characteristics of the data. For example, in the above example, if there were very few people in the database older than 30, using an age index and subsequently applying the filtering conditions, city=‘Philadelphia’ and salary<100,000 might be the most effective strategy because only a few rows would be returned from the first step.
Most query optimizers attempt to determine the best query execution plan by mathematically modeling the estimated execution cost for each of the plans and selecting the one with the lowest estimated cost. Other optimizers apply a system of rules to select a query plan. Because the optimizer makes a number of assumptions that may or may not be true, sometimes the plan the optimizer selects is not the best execution plan. When the query plan is actually executed, the query may take a long time to run. Sometimes a database administrator or other data processing professional will review the execution plan and try to determine a better way to structure the query to make the query run faster. This process, whether performed by human or machine, is often referred to as query tuning.
Some DBMSs provide query tuning tools to help develop more efficient queries. Tools to generate estimated query plans based on database statistics, and actual execution plans based on actually running the query, may be provided. A query analyzer, for example, may provide an estimated or actual graphical query execution plan. A profiler or other query tuning tool may provide estimated and/or actual textual execution plans.
A complex graphical query plan may be divided into a number of parts represented as icons or blocks, listed one on top of another on a screen or printout. Each part may represent a separate process or step that the optimizer had to perform to get to the final results. Each step may be broken down into small sub-steps, with the sub-steps and steps connected by arrows showing the path the query took when it executed. The thickness of the arrows may indicate the relative cost in number of rows and row size of the data moving between the blocks. As may be appreciated, the results may not be easy to read and interpret, by man or machine. Additionally, a graphical query execution plan may be difficult to transport from one machine to another because of size and software compatibility.
Textual query execution plans may be provided, by, for example, selecting particular data elements to be traced. Textual query execution plans, like graphical plans, can be quite difficult to read and understand, especially for a machine or process. An exemplary textual query execution plan for the very simple query
SELECT count(*) FROM dbo.lineitem is displayed in FIG. 7(700). As can be readily understood, in order for a textual query execution plan to be read and operated on by a machine or process, the textual plan must be parsed character by character, a not insignificant programming task in itself.
It would be helpful, therefore if a hierarchical, plain-text execution plan could be provided in a standardized output format that would be easy for humans and machines to read, understand and work with.