Oftentimes, an application or service that queries a data store via one or more queries for requesting data will repeat the same or similar query many different times. For instance, during a data gathering step, an application might use the structured query language (SQL), which is a query language used to query relational databases, to repeatedly query a relational data store in the same or a similar manner. In this regard, applications sending SQL statements to a database management system (DBMS), such as SQL Server, often submit many statements, which are identical to one another except for a small modification, such as to the literal constants appearing in the statement.
For example, SQL statement 800 and 810 of FIGS. 8A and 8B, respectively satisfy this condition. Query 800 selects 802 from a database table 804 named ‘customers’ all of the entries according to expression 806, i.e., wherever the state is named ‘California’ in table 804 and likewise query 810 selects 812 from the database table 804 all of the entries where the state is named ‘Georgia’ according to expression 816.
A query plan, or query execution plan, is a set of steps used to access information in a SQL relational DBMS. Since SQL is declarative, there are typically a large number of alternative ways to execute a given query, with widely varying performance. In this regard, today, when a query is submitted to the database, a query optimizer on the server side typically evaluates some of the different, correct possible plans for executing the query and returns what the optimizer considers the best alternative. Then, the query is executed according to the optimized query plan, and the results are returned via an efficient execution path.
For uniform data distributions, two statements that differ only in literals will often share the same query execution plan determined by the query optimizer. Because execution plans can be expensive to compile, both in terms of central processing unit (CPU) time and memory use, execution plan sharing among such similar-looking statements can give significant performance benefits to a relational database system, such as a SQL DBMS.
To allow this performance benefit, SQL client drivers, e.g., Object Linking & Embedded Database (OLEDB), ActiveX Data Object.NET (ADO.NET), etc., currently permit applications to specify that the query should be parameterized. For example, the application developer writes a single query 820 instead of queries 800 and 810 of FIGS. 8A and 8B above including select command 822, which selects from customers table 804 according to the parameterized expression 826. Parameterized expression 826 is said to be parameterized because the particular state names are replaced with a ‘?’ placeholder 828. In practice today, an application then merely supplies the values for the ‘?’ placeholder 828 each time the application needs to execute the query with particular values.
For instance, using the example queries 800 and 810 of FIGS. 8A and 8B, to execute query 800, the application may tell the client driver to execute the parameterized statement 820 of FIG. 8C with ‘California’ as the value for a ‘?’ placeholder 828; similarly, to execute query 810, the application may tell the client driver to execute the parameterized statement 820 of FIG. 8C with ‘Georgia’ as the value for the ‘?’ placeholder 828.
When the client drivers and interfaces receive a statement with a ‘?’ placeholder 828 from an application, a parameterized query is generated by the client drivers and then sent to the SQL DBMS server. At this point, the client drivers do not require the application to specify parameter types for the placeholders, and many applications do not. In those cases, however, the client drivers need to deduce the type of the placeholder in order to generate the parameterized query. Unfortunately, many client drivers perform this deduction poorly. Specifically, the client drivers use the actual values passed for ‘?’ placeholders to deduce the parameter type to send to the SQL DBMS. For example, if parameterized query 820 is invoked with ‘California’ as the value for ‘?’ 828, the client driver would generate statement 900 of FIG. 9A, while if ‘Georgia’ is used as value for ‘?’ 828, the client driver would generate statement 910 of FIG. 9B. While statements 900 and 910 appear similar, statements 900 and 910 are different queries from the point of view of a SQL DBMS, because the types of @p1 are different. Specifically, the type 902 of @p1 in the example of statement 900 is varchar(10) referring to a varchar of length 10, and the type 912 of @p1 in the example of statement 910 is varchar(6) referring to a varchar of length 6. Thus, even though the application specified the same query 820 of FIG. 8C, two different queries are nonetheless sent to the server.
Each query is then optimized by a query optimizer, which is costly both in CPU time and memory consumption, and the resulting plan is stored in the execution plan cache, taking up valuable memory space. It would be considerably more desirable, therefore, if it were possible to share query execution plans for statements, such as statements 900 and 910, when parameterized. One might consider that considerably more complex examples are possible than the exemplary queries of FIGS. 8A and 8B, and one can imagine data processing applications that would implicate thousands, millions, even billions of the same or similar queries. Sharing execution plans in such cases would be extremely advantageous to avoid much unnecessary processing time and memory.
While it is possible to provide a smarter way of deducing parameter types by the client drivers on the client side, the above-described deficiency of the state of the art of parameterizing queries affects a wide range of drivers, not just a single set. Thus, such a solution would require updating a lot of widely distributed client side software for each set of separate drivers, which is not very practical. Even if such client side software could be re-written or updated for the wide range of drivers that exist, there may still nonetheless be customers who fail to obtain and install the updated versions. Accordingly, customers using old versions of client drivers would still observe the above-described performance problems from lack of efficient sharing of query execution plans. Hence, a solution on the server-side that interacts with the existing range of client drivers is desirable.
Other deficiencies in the state of the art of query optimization for parameterized queries in a computing system will also become apparent upon review of the following description of various exemplary, non-limiting embodiments of the invention.