1. Field of the Invention
This invention relates to systems, methods, and computer programs in the field of database information processing, and more specifically to database management systems including compilers for compiling program code including query statements packaged with control or procedural statements.
2. Description of the Related Art
The Structured Query Language (SQL) has been widely used in many relational database systems as a database language to define and manipulate databases. Some of these database systems include IBM DB2/CS, Oracle 7, Sybase Server System 10, and Informix Online Dynamic Server. (See, DATABASE 2 AIX/6000 and DATABASE 2 OS/2 SQL Reference, IBM Corporation, First Edition, October 1993; Database 2 Programming Functional Specification, IBM Corporation, Version 2.1. P481-0021, August 1994; Oracle 7 Server Application Developer's Guide, Oracle Corporation, December 1992, Part: 6695-70-0212; TRANSAC-SQL User Guide, Sybase, 1993; The INFORMIX Guide to SQL: Syntax, V.6.0, Informix, March 1994, Part 000-7597; Jim Melton, editor, (ISO-ANSI Working Draft) Database Language (SQL2), International Organization for Standardization and American National Standards Institute, 1992.)
In such systems, access of data from an application program is achieved by means of embedded SQL statements. SQL can be viewed as consisting of two main components: Data Definition Language (DDL) that defines a database, and Data Manipulation Language (DML) that manipulates and accesses data in a database. With embedded SQL statements, application programs can insert a new row into a table, delete rows from a table, update rows from a table, and selectively retrieve rows from a table. The current SQL language standard is known informally as SQL/92 or SQL2. The SQL statements embedded in an application program are translated by the host language pre-processor into invocations to certain APIs. However, SQL2 does not provide any mechanism for expressing program logic and control flow. Instead, the control flow of the application logic is written in the host language. Thus, SQL has been mainly used exclusively for database related services, and has been mostly used as an embedded language in application programs (written in C, COBOL, FORTRAN, etc.) where the host language provides the needed control logic of an application.
Since each SQL statement is compiled into an API call to database server routines executed by the server, and the result is sent back to the application program, frequent communication and data transfer between the client and the server results in low performance. The current trend is to push more logic into the server side, so that the client application can get back only the data it really needs.
The upcoming SQL3 standard (See, Jim Melton, editor, (ISO-ANSI Working Draft) Database Language (SQL3), International Organization for Standardization and American National Standards Institute, August 1994) has defined a procedural extension to the existing SQL2 language. The SQL3 standard has included a new set of statements, called control statements (also referred to herein as procedural statements or procedural constructs), that allow control logic to be incorporated into an SQL statement. The purpose is to make SQL a more computationally-complete language so that computations which have no direct impact on the client side can be moved to the server side to reduce the amount of data transferred between the server and the application. As a consequence, significant performance improvement can be achieved. With SQL3, control logic that used to be expressed only in a host language can now be packaged with query statements to be serviced by the server in a single dialog. The inclusion of these new control statements greatly enhances the expressiveness of the language and enables various programming techniques that used to rely on host languages.
The SQL3 control statements include IF statements, LOOP statements, WHILE statements, REPEAT statements, SET (assignment) statements, and compound statements. The compound statement creates a new scope and allows local variable declarations.
There are four different program contexts where these SQL3 control statements can be used to improve performance of the application program and the functionality provided by the database engine: (1) as an embedded segment of the host program; (2) as the function body of a user-defined function (DB2/CS V2 SQL Referenced); (3) as the body of a stored procedure, and (4) as the body of a trigger (See, Database 2 Programming Functional Specification, IBM Corporation, Version 2.1. P481-0021, August 1994). These four program contexts are discussed in more detail as follows.
1. Embedded in Host Program
Like ordinary query statements, SQL3 control statements can be embedded in the application program using the "EXEC SQL" keywords, as shown below.
______________________________________ main( ) . . . /* Get trading information: performed by the host program. */ . . . /* Update database: performed by the control statement. */ EXEC SQL BEGIN DECLARE company.sub.-- id INTEGER; /* Look up the unique ID by name. */ SELECT id INTO company.sub.-- id FROM company WHERE strcmp(name, :company.sub.-- name) = 0; /* Insert current trading information. */ INSERT INTO tradings VALUES (id, :trading.sub.-- type, :trading.sub.-- time, :unit.sub.-- price, :total.sub.-- shares); /* Update the selling or buying price. */ IF (:trading.sub.-- type = 0) THEN UPDATE stock SET selling.sub.-- price = :unit.sub.-- price WHERE id = company.sub.-- id; ELSEIF (:trading.sub.-- type = 1) THEN UPDATE stock SET buying price = :unit.sub.-- price WHERE id = company.sub.-- id; END IF; END; /* Host program continues here. */ . . . } ______________________________________
In the example shown above, a compound statement, BEGIN . . . END, is embedded in the host program, following the code which acquires the current trading information from the trading market. This compound statement inserts the current trading information into the tradings table and, depending on the trading.sub.-- type, it updates the selling.sub.-- price column or the buying.sub.-- price column of the corresponding row of the stock table. Since the compound statement is executed entirely on the server side, all the query statements (SELECT, INSERT, and UPDATE) it contains will be serviced in a single communication between the client program and the database server. Such a database access scenario could have taken three communications to complete if it were programmed without using the compound statement.
As demonstrated by this example, there are three advantages provided by the SQL3 control statements. First, in general, the number of communications between the client and the server are reduced by packaging related database access operations with control statements. Therefore, performance of the application program is significantly improved, especially in the environment where communication cost is significant (e.g. multi-media applications with a remote database server). Second, unnecessary bind-in and bind-out is avoided if the output of one query statement is used only as the input to another query statement. Third, readability of the application program is also improved because related database access operations can be packaged into one single SQL statement.
2. User-Defined Function
Control statements are also useful in composing the function body of a user-defined function. For example, the database access session as shown in the previous subsection can be modified and packaged as a user-defined function.
______________________________________ CREATE FUNCTION log.sub.-- transaction(name CHAR(30), type INTEGER, time TIME, price FLOAT, amount INTEGER) RETURNS INTEGER BEGIN DECLARE company.sub.-- id, delta INTEGER; SELECT id INTO company.sub.-- id FROM company WHERE strcmp(company.name, name) = 0; INSERT INTO tradings VALUES(company.sub.-- id, type, time, price, amount); IF (type = 0) THEN SET delta = price - SELECT buying.sub.-- price FROM stock WHERE id = company.sub.-- id; UPDATE stock SET buying price = price WHERE id = company id; ELSEIF (type = 1) THEN SET delta = price - SELECT selling.sub.-- price FROM stock WHERE id = company.sub.-- id; UPDATE stock SET selling.sub.-- price = price WHERE id = company.sub.-- id; END IF; RETURN delta; END; ______________________________________
The user-defined function log.sub.-- transaction takes all the trading information as input, performs the same tasks as the previous example, and returns the unique ID of the company which issues the stock being traded. This user-defined function can now be invoked to log a particular transaction.
______________________________________ EXEC SQL INSERT :company.sub.-- name, :time, log.sub.-- transaction(:company.sub.- - name, :trading.sub.-- type, :time, :unit.sub.-- price, :total.sub.-- shares) INTO price.sub.-- trace; ______________________________________
3. Stored Procedure
Similarly, the user-defined function shown in the previous sub-section can be turned into a stored procedure. A stored procedure is basically a precompiled program that is stored at the server site (and is known to the server). It is invoked from the client by a remote procedure call (RPC). The number of messages between client and server can be reduced if the system provides a stored procedure mechanism.
______________________________________ CREATE PROCEDURE log.sub.-- transaction(name CHAR(30), type INTEGER, time TIME, price FLOAT, BEGIN DECLARE company.sub.-- id INTEGER; SELECT id INTO company.sub.-- id FROM company WHERE strcmp(company.name, :company.sub.-- name)= 0; INSERT INTO tradings VALUES(id, type, time, price, amount); IF (type = 0) THEN UPDATE stock SET buying.sub.-- price = price WHERE id = company.sub.-- id; ELSEIF (type = 1) THEN UPDATE stock SET selling.sub.-- price = price WHERE id = company.sub.-- id; END IF; END; ______________________________________
The stored procedure log.sub.-- transaction takes all the trading information as input and performs the tasks similar to that of the previous example. This stored procedure can now be invoked to log a particular transaction.
______________________________________ EXEC SQL CALL log.sub.-- transaction(:company.sub.-- name, :trading.sub.-- type, :time, :unit.sub.-- price, :total.sub.-- shares); ______________________________________
4. Trigger
Control statements are also very useful in writing the trigger body of a trigger. A triggered procedure is a procedure that is to be invoked when a specified trigger condition occurs. For example, the following trigger can be created on the stock trading table.
______________________________________ CREATE TRIGGER trade.sub.-- trigger AFTER INSERT ON tradings REFERENCING NEW as new.sub.-- record FOR EACH ROW MODE DB2SQL BEGIN DECLARE day.sub.-- high.sub.-- value, day.sub.-- low.sub.-- value DOUBLE; /* Look up day.sub.-- high and day.sub.-- low. */ SELECT day.sub.-- high, day.sub.-- low INTO day.sub.-- high.sub.-- value, day.sub.-- low.sub.-- value FROM daily.sub.-- trade.sub.-- record WHERE id = new.sub.-- record.id; /* Update the daily high record if this is the new high record. */ IF (new.sub.-- record.trading.sub.-- price &gt; day.sub.-- high.sub.-- value) THEN UPDATE daily.sub.-- trade.sub.-- record SET day.sub.-- high = new.sub.-- record.trading.sub.-- price WHERE id = new.sub.-- record.id; /* Update the daily low record if this is the new low record. */ IF (new.sub.-- record.trading.sub.-- price &lt; day.sub.-- low.sub.-- value) THEN UPDATE daily.sub.-- trade.sub.-- record SET day.sub.-- low = new.sub.-- record.trading.sub.-- price WHERE id = new.sub.-- record.id; END; ______________________________________
Now the database will maintain the daily high trading record and the daily low trading record up-to-date whenever a new record is inserted into the tradings table.
______________________________________ EXEC SQL INSERT INTO tradings VALUES(:company.sub.-- id, :type, current time, :trading.sub.-- price, :amount); ______________________________________
As shown by the examples and discussion above, there are several advantages to the procedural extensions of SQL3. The improved performance gains by means of these procedural constructs is highly desired by the application developers. The performance is enhanced by minimizing network communication traffic between a client application and server host. The procedural constructs allow "control abstraction" by grouping SQL statements together procedurally resulting in fewer bind in/bind out processes. Instead of the server handling one SQL statement at a time as for SQL2, SQL3 enables the server to perform global optimization of the query statements. Also, because of SQL3's procedural aspects, SQL3 is now easily adaptable to object extensions since objects also contain methods or procedures. A procedural extension to SQL is a prerequisite to handling objects
However, this new extension in SQL3 creates several problems for existing SQL2 compilers. First of all, the bind-in/bind-out process of host variables has to be generalized due to the new statement-grouping mechanisms. Second, local variables which persist in its declaring scope makes data sharable among several query statements. Third, the semantics of the procedural constructs are not expressible in terms of the existing table-oriented abstract representation for SQL2 statements. Fourth, since the purpose of control statements is to improve performance, the resulting execution plan has to be efficient enough to outmatch a similar scenario without control statements. Fifth, existing SQL2 compilers presently provide optimization techniques for just query statements. Global optimization for both procedural and query statements together is a new area. These problems are further discussed as follows.
Host Variable Bind-in/Bind-out
In an SQL-based environment, an SQL statement interacts with the host program by means of a set of host variables. Host variables are made known to the SQL compiler at the early stage of the application preparation-time. A reference to a host variable of certain type in the SQL statement is compiled into a plan object (run-time object) of the corresponding SQL type. Host variables could be used for the application to provide data to the database server (input host variable) or to receive data from the database server (output host variable), depending on the context where the host variable appears in the SQL statement. The same host variable can have multiple occurrences in an SQL statement, as an input host variable, an output host variable, or both. At run-time, all the input host variables are packed into a variable-length structure, input SQLDA, that is transferred to the database engine as input parameters to the execution plan of the corresponding SQL statement. The very first action taking place in the execution plan is to convert the data contained in the input SQLDA to the corresponding plan objects which are directly accessed by the execution plan. Such a process is usually referred to as bind-in process. Similarly, for output host variables, there is a bind-out process taking place at the end of the execution plan to convert the plan objects to their corresponding SQLDA entries.
In SQL2, the mapping from SQLDA entries to the corresponding plan objects is simple. For example, the n-th SQLDA entry corresponds to the n-th host variable occurrence in the SQL statement. However, such a simple mapping no longer works for a SQL3 control statement because there can be more than one SQL statement embedded in a control statement. As a result, there can be multiple orders of host variables such that the order of host variable occurrences in one statement does not necessarily agree with that of the statements. Also, there may be sharing of host variables among the statements such that one host variable may be an output variable in one statement, but an input variable in a subsequent statement. This problem did not occur in SQL2 since the server was only handling one SQL statement at a time.
Handling Local Variables
Local variables are new SQL objects introduced in the SQL3 standard. Unlike column objects, the same local variable is accessible in multiple SQL statements. Besides, local variables are associated with a scope where they are defined. Therefore, a local variable of an inner scope will make another local variable with the same name in an outer scope invisible from the inner scope. Similarly to host variables having multiple orders, local variables can have nested scopes. Also similar to the problem with host variables, local variables can be shared among the multiple statements. How to handle local variables at run-time as well as compile-time becomes a new issue to the existing SQL2 compiler.
Representation of Control Statements
The current compiler infrastructure, which is designed to facilitate query processing, is not suitable to deal with these control statements. For example, QGM (query graph model) is the graph representation model used in DB2/CS V2 (IBM DB2/Client Server Version 2). Although QGM has been proven to be very powerful and effective for describing table transformations for a given SQL data statement, it is quite unnatural to describe control flow that has nothing to do with tables.
QGM is powerful in representing query statements whose semantics can be directly mapped into a table transformation function. However, SQL3 control statements neither operate on tables nor generate tables. It is very difficult, if not impossible, to represent all the control statements in terms of table transformation functions. Besides, representing control statements in the format of QGM may lead to one of the two consequences: 1) each phase of the compiler has to be enhanced to eliminate the overhead introduced at the first place, or 2) the overhead will stay in QGM and eventually show up in the execution plan. Neither case is favorable, if there is a simpler solution. Therefore, how to come up with a systematic way of representing procedural constructs without losing the advantage of QGM becomes a challenge to the design of a new compiler.
Performance of the Execution Plans
One of the goals of using control statements is to improve the run-time performance of an application, a user-defined function, a stored procedure, or a trigger. Therefore, the resulting execution plan has to be efficient enough to outmatch the same scenario without using control statements. As an example, the user-defined function given above should have better performance than an equivalent one written in an external language such as C.
Presently, the SQL2 compilers today perform optimization techniques for query statements only, and not for procedural statements. Separately, there are also optimization techniques for procedural languages, but the optimization techniques are not applicable for query statements. However, with a control statement there is an interplay between query statements and procedural statements whereby both, together, could be made more efficient by global optimization. Global optimization is a new issue that has been introduced with control statements. It is presently unknown how to do control flow optimization in conjunction with query optimization.
There exists today systems that are capable of handling SQL3-like control statements having a procedural and non-procedural part. However, there are problems associated with the present day approaches. For example, one approach is to represent the procedural constructs as table functions in order to utilize, intact, an SQL2 query compiler. This approach is used in IBM DB2/CS Trigger. However, it is very difficult and unnatural to represent procedural constructs as table functions, and redundancy will be introduced. There is much difficulty in teaching a compiler to recognize the difference between a real SELECT statement and a SELECT statement that is being used somehow to represent the control flow of a procedure statement. Other problems with this approach include the significant impact on the existing query compiler, the difficulty in preserving the statement boundary, the difficulty in preserving the control flow information, and the difficulty in removing the redundancy introduced in the first place. Another approach, used by Oracle and Sybase, treats the procedural extension as a server-side host language. Since the procedural and non-procedural part of the statement is compiled in two totally different environments that do not talk to each other, it is very difficult, if not impossible, to perform global optimization on the query statements. Other difficulties include the significant communication overhead with the SQL interpreter, significant overhead due to data movement, and duplicate effort for the procedural engine and the host language pre-processor.