The present invention relates to database systems, and more specifically, to a mechanism for removing ambiguities in a shared database command.
In typical database systems, users store, update and retrieve information by submitting commands to a database server. To be correctly processed, the commands must comply with the database language that is supported by the database server. One popular database language is known as Structured Query Language (SQL).
The response time of a finely-tuned database can be adversely affected by only a tiny proportion of the workload. A challenge that is encountered by most database administrators (xe2x80x9cDBAxe2x80x9d) is how to constantly ensure that the performance of a database is not degraded due to execution of resource-intensive database commands.
In many cases, a database command may be highly resource-intensive due to certain inefficiencies that were introduced by the application that generated the database command. Database commands that fall into this category are referred to herein as xe2x80x9cpoorly performing database commandsxe2x80x9d. For example, if the application SQL is written in a way that prevents the optimizer from using an available index, in executing the database command, the database server will typically be required to perform a full scan of the table in order to determine the correct rows. By having to execute a full scan of the table, the database server is required to process a greater number of rows than would normally be required if the rows were selected using a proper index. Thus, by building the index on the table, or by modifying the application to generate an alternate database command that reduces the number of rows that need to be scanned, the resources required by the database command can be reduced. In addition, because a database command may be repeatedly executed, often by multiple database users, the tuning of the database command can significantly increase the performance of a database system.
For database systems that support the SQL database language, database commands typically take the form of SQL statements. In general, the initial step in tuning the workload of a database system that supports SQL consists of identifying the SQL statements that are highly resource-intensive, and thus inflicting the most damage on the performance of the database system.
One method of capturing the metrics of poorly performing database commands is by tracing the execution of SQL statements as they are executing in the database system. In particular, by executing a tracing mechanism, the relevant details of a statement""s execution can be captured. The details of the statement""s execution can then be examined to identify any inefficiencies that were introduced by the application program in generating the SQL statement.
However, a drawback with tracing the execution of an SQL statement is that the tracing mechanism itself requires a vast amount of resources to execute and thus introduces a significant amount of overhead into the database system. In many cases, the cost of turning on the tracing mechanism is so prohibitive as to dissuade the DBA from arbitrarily tracing the SQL statements executing in the database system. Thus, the tracing mechanism is typically only enabled and started after the performance of the database system has already significantly declined.
Another method of identifying highly resource-intensive SQL statements involves reviewing the metric information that is associated with the different SQL statements that are executing in the database system. The metrics for a particular SQL statement can provide such details as the execution time, the I/O count, the number of sorts, and the rows processed while executing the particular SQL statement. By examining the metric information that is associated with SQL statements executing in the database system, the DBA can identify those SQL statements that impact the database most heavily.
For example, in certain database systems, such as the Oracle(trademark) database system, in executing SQL statements, the database system maintains the SQL statements and their aggregated metrics in a shared cache. The metrics may include, for example, the number of I/O operations performed during execution of the query and the number of times the query has been executed. Thus, by inspecting the metric information, a DBA can determine the per-execution metrics for that particular SQL statement, and thus identify poorly performing SQL statements.
After a poorly performing query has been identified, the database may be xe2x80x9ctunedxe2x80x9d to allow the query to perform more efficiently. The process of tuning the database for executing the queries may involve, for example, creating new indexes that can be used to process the queries. To know how to tune a database for a query, the schema against which the query is executed must be known. For example, if a poorly performing query is executed against an xe2x80x9cempxe2x80x9d table in one schema, performance of the query would not be benefited by building an index on an xe2x80x9cempxe2x80x9d table in a different schema.
Unfortunately, it is not always clear what schema context was used to execute a query. For example, to limit the amount of resources that are required to maintain the shared cache area, the database system maintains a single entry for each distinct SQL statement, regardless of how many different schemas execute the SQL statement. SQL statements for which only a single cache entry is maintained, but which have been issued from more than one schema, are referred to as xe2x80x9cshared SQL statementsxe2x80x9d.
Thus, it is possible for two users to issue separate SQL statements from different schemas but have both SQL statements map to the same shared cache entry, thus causing the aggregated metrics in the shared cache to reflect the sum of the statistics across both users for the given statement. For example, if user xe2x80x9cAxe2x80x9d using a schema xe2x80x9cXxe2x80x9d context and user xe2x80x9cBxe2x80x9d using a schema xe2x80x9cYxe2x80x9d context, each execute the SQL statement:
SELECT xe2x80x9cnamexe2x80x9d, xe2x80x9csalaryxe2x80x9d, xe2x80x9cdept_numxe2x80x9d
FROM emp, dept
WHERE emp.name=dept.emp_name
AND emp.salary greater than 50000.
both statements will map to the same cache entry even though the two statements were executed in different schema contexts.
In this example, the table objects xe2x80x9cempxe2x80x9d and xe2x80x9cdeptxe2x80x9d used in the query are ambiguous. An ambiguous object is an object that is identified using a name that applies equally to objects that belong to different schemas. In this example, schema xe2x80x9cXxe2x80x9d and schema xe2x80x9cYxe2x80x9d both have tables named xe2x80x9cempxe2x80x9d and xe2x80x9cdeptxe2x80x9d, thus rendering the object names xe2x80x9cempxe2x80x9d and xe2x80x9cdeptxe2x80x9d ambiguous in the SQL statement.
One common way to refer to objects unambiguously is to qualify the ambiguous object name with the name of the schema that contains the desired object. For example, the xe2x80x9cempxe2x80x9d table in schema X would be referred to as X.emp, while the xe2x80x9cempxe2x80x9d table in schema Y would be referred to as Y.emp. Because the SQL statement given above does not qualify the ambiguous names by a particular schema, there is no way of determining whether the SQL statement was executed by user xe2x80x9cAxe2x80x9d in the context of schema xe2x80x9cXxe2x80x9d or by user xe2x80x9cBxe2x80x9d in the context of schema xe2x80x9cYxe2x80x9d by looking at the cache entry for the SQL statement.
Thus, the problem with having ambiguous objects in an SQL statement is that the particular schema that was used to execute the SQL statement can not be identified. Without knowing the particular schema that was used to execute the SQL statement, the DBA cannot execute the SQL statement within the same operating environment as the database user who originally ran the statement. This is important as the DBA needs to tune the statement from the context of the executing user.
For example, a xe2x80x9cDEMOxe2x80x9d schema and a xe2x80x9cPRODUCTIONxe2x80x9d schema may both include a set of table objects xe2x80x9cempxe2x80x9d and xe2x80x9cdeptxe2x80x9d. As such, the SQL statement:
SELECT *
FROM emp, dept
may be executed using both the xe2x80x9cDEMOxe2x80x9d schema and the xe2x80x9cPRODUCTIONxe2x80x9d. However, the xe2x80x9cDEMOxe2x80x9d schema may include only a small number of entries in each of its table objects xe2x80x9cempxe2x80x9d and xe2x80x9cdeptxe2x80x9d, while the xe2x80x9cPRODUCTIONxe2x80x9d schema may include an extremely large number of number of entries in each of its table objects xe2x80x9cempxe2x80x9d and xe2x80x9cdeptxe2x80x9d. Thus, by adding an index to the table objects xe2x80x9cempxe2x80x9d and xe2x80x9cdeptxe2x80x9d in the xe2x80x9cPRODUCTIONxe2x80x9d schema, a DBA may be able to tune the database to cause the SQL statement to run significantly more efficient in the xe2x80x9cPRODUCTIONxe2x80x9d schema. However, because the xe2x80x9cDEMOxe2x80x9d schema includes only a small number of entries in each of its table objects xe2x80x9cempxe2x80x9d and xe2x80x9cdeptxe2x80x9d, adding an index to the table objects xe2x80x9cempxe2x80x9d and xe2x80x9cdeptxe2x80x9d in the xe2x80x9cDEMOxe2x80x9d schema will not cause the SQL statement to run significantly more efficient in the xe2x80x9cDEMOxe2x80x9d schema. In fact, the overhead that is associated with maintaining an index may actually reduce the efficiency of the database by adding an index to the table objects xe2x80x9cempxe2x80x9d and xe2x80x9cdeptxe2x80x9d in the xe2x80x9cDEMOxe2x80x9d schema. Thus, to properly to tune the database system, the DBA must be able to determine the schema from which an SQL statement having ambiguous object references was executed.
One method of determining the schema in which an SQL statement that includes ambiguous object references was executed involves having the DBA log into multiple user accounts in the hope that the schema context of one of the user accounts can resolve the ambiguous object references so that the statement can be executed. For example, by logging in as user xe2x80x9cSTEVExe2x80x9d the DBA is able to execute the SQL statement using STEVE""s schema context. If the ambiguous object references can be resolved through STEVE""s schema context, the SQL statement can properly execute. Thus, the DBA can identify STEVE""s schema as a xe2x80x9cvalidxe2x80x9d schema for executing the SQL statement. Conversely, if the ambiguous object references cannot be resolved through STEVE""s schema context, the SQL statement will not properly execute and the DBA can identify STEVE""s schema as an xe2x80x9cinvalidxe2x80x9d schema for executing the SQL statement
However, a significant drawback with this xe2x80x9csolutionxe2x80x9d is that to login as the different users of the system, the DBA is forced to maintain personal login information for each of the different users of the system. In addition, having to login in as the different users to determine whether an SQL statement can be properly executed can be both extremely burdensome and time consuming to the DBA.
Based on the foregoing, there is a clear need to provide a mechanism that can identify the schema context that was used in executing an SQL statement that includes ambiguous references.
There is also a clear need to provide a mechanism that can determine the metrics for SQL statements that include ambiguous references.
The foregoing needs, and other needs and objects that will become apparent from the following description, are achieved in the present invention, which comprises, in one aspect, a method for identifying one or more schemas from which a Structured Query Language (SQL) statement may have been executed, the method comprising the computer-implemented steps of determining a set of object references that are included in the database command; identifying one or more ambiguous object references that are contained in the set of object references; and determining a set of one or more schemas that contain objects that correspond to each of the one or more ambiguous object references.
According to another feature, the step of identifying the one or more ambiguous object references includes the steps of identifying any qualified object references from the set of object references that is associated with a schema identifier that identifies a particular schema for which the qualified object reference is to execute; identifying any public synonyms from the set of object references that represents an alias that may be used in multiple schemas to identified a single object; and determining the one or more ambiguous object references by selecting from the set of object references only those object references that have not been identified as either qualified object references or public synonyms.
In another feature the step of determining the set of one or more schemas includes the step of identifying schemas that include a set of access permissions for accessing the one or more ambiguous object references as required by the database command.
In yet another feature, user input is received that identifies, from the set of one or more schemas, a selected schema for which tuning is to be performed. Based on the selected schema an explain plan for the database is generated by associating a schema identifier that identifies the selected schema with each of the one or more ambiguous object references.
The invention also encompasses a computer-readable medium, a computer system, and a computer data signal embodied in a carrier wave, configured to carry out the foregoing steps.