The present invention relates generally to a routine executing method in a database system, and more particularly to a routine executing method suitable for parallel processing of queries in a relational database management system.
Conventionally, relational database systems, particularly SQL database systems, have been increasingly applied mainly to data processing for business.
The earlier SQL specifications have been so configured as to instruct a database server to individually execute one by one database operations such as search, update, insert, delete and so on of data.
The current SQL specifications, on the other hand, provide instructions for substitution and control such as IF, WHILE and so on, in addition to the above-mentioned instructions for database operations, and can collectively define a series of processing in the form of a procedure or a function with an appropriate name given thereto. Here, the procedure and the function are collectively referred to as a xe2x80x9croutine.xe2x80x9d
As a draft of standardization for currently standardized SQL3, there is ISO/IEC JTC1/SC21/WG3 DBL-MCI-004, ISO Working Draft Database Language SQL, 1996.
A server for accessing a database in a database management system receives a request issued during the execution of a user application from a client, accesses the database, and returns a result. Also, in a relational database system, data can be readily divided and positioned into a plurality of database processing servers such that the plurality of database processing servers can parallelly access the divided data to improve the performance of the overall relational database system.
A need for such parallel database systems has been becoming higher with an increase in the amount of data. An exemplary parallel database system is described in D. DeWitt, et. al., xe2x80x9cParallel Database Systems: The Future of High Performance Database Systemsxe2x80x9d, CACM, Vol. 35, No. 6, 1992.
A conventional parallel database system comprises a server (hereinafter referred to as the xe2x80x9cfront end serverxe2x80x9d) which has a function of analyzing and compiling a query from a user application program (hereinafter abbreviated as xe2x80x9cUAPxe2x80x9d) in a client, and a function of totalizing the results of executing the query for divided data positioned in a plurality of parallel processing servers to return the totalized results to the client; and a plurality of servers (each of which is hereinafter referred to as the xe2x80x9cdatabase operation serverxe2x80x9d), each of which has a function of accessing a disk drive, which stores data, for performing required operations on the data. In general, the front end server and the database operation servers are interconnected through a high speed interconnection network.
For configuring a parallel data processing scheme, servers may be allocated on a function basis to achieve a pipeline-type parallel structure, other than the above-mentioned one which allocates divided data to a plurality of servers. In this case, execution process instructions are created for each server function, and the parallel processing is realized by executing each execution process in a multi-thread form.
In a parallel database system, as the division of data and respective functions are parallelly executed as mentioned above, communications are performed between associated functions for transmitting and receiving data flows through pipelines and for other processing. Therefore, the parallel database system has a challenge to reduce a load caused by the communications.
Next, description is made on the execution of a procedure in a parallel database system as a prior art technique. A procedure is defined by a CREATE PROCEDURE statement. In SQL3, a user-defined type, referred to as Abstract Data Type (ADT), can be defined, and another procedure may also be defined in a definition statement for the abstract data type.
A definition statement for a procedure is analyzed by the front end server which creates internal type codes (hereinafter referred to as an xe2x80x9cexecution process instructionsxe2x80x9d) corresponding to a processing method and process. The execution process instruction may be codes interpreted by an interpreter or codes of execution type. The execution process instruction is registered as dictionary information related to the procedure.
For calling a procedure, a CALL statement is provided for calling a SQL statement from an application program written in a high class language such as C language. The CALL statement is analyzed by the front end server which retrieves execution process instructions corresponding to a procedure name and arguments written in the CALL statement from the dictionary information and executes the retrieved execution process instructions.
If processing described in a procedure includes a database operation such as search, update, insert and deletion of data (hereinafter update, insert and delete queries are collectively referred to as the xe2x80x9cupdate-related queriesxe2x80x9d), execution process instructions for performing the database operation are provided for the front end server as well as for the database operation servers which have data to be operated.
Each of the database operation servers operates divided data allocated thereto in accordance with execution process instructions provided thereto. The front end server in turn executes execution process instructions for totalizing the results of database operations performed by the respective database operation servers.
As an example, if a database is searched for certain data, the respective database operation servers search for the data, transfer retrieved data to the front end server which sequentially receives and uses the data transferred thereto.
If a dedicated database operation server is provided for high load processing such as sorting or the like, execution process instructions are created likewise for this database operation server.
In the configuration of the parallel database system described above, it is the front end server that analyzes database operation statements for search, update, insert, delete and so on, and totalizes the results from the respective database operation servers. Also, for executing a procedure from a user application, a body of the procedure is executed by the front end server.
If the body of the procedure includes a database operation statement, the front end server executes the body of the procedure, continuously totalizes the results from the respective database operation servers, and performs the execution of the procedure.
More specifically, a procedure execution unit and a database operation statement execution unit are both included in execution process instructions on the same front end server side. However, database operations are actually performed by the respective database operation servers in which divided data is allocated.
Assume herein that, in the parallel database system, when a database operation statement includes execution of a function, a call to the function is performed by a database operation server which has divided data allocated thereto. Consider also that a database operation statement is also included in the body of the function.
In the parallel database system, the execution of a body of a function is common to the execution of a body of a procedure except for a part of functions, so that the same execution unit on the front end side (hereinafter simply called the xe2x80x9cfront end sidexe2x80x9d) may be used both as an execution unit for function and as an execution unit for procedure. Therefore, in the case assumed above, every time a function is called, communications such as a request for execution and so on will be made between the function calling party (in this case, the execution unit on the database operation side (hereinafter simply called the xe2x80x9cdivided data sidexe2x80x9d)) and the execution unit for the function (in this case, the front end side). Although depending upon the type of query from the UAP, the number of communications may amount to an immense value since the function is likely to be called on the divided data side the number of times equal to the number of data records stored therein.
It is an object of the present invention to provide techniques for reducing the number of times of communications between the front end side and the database operation side, caused by a routine call in processing related to a query, to achieve a reduction in time required to execute the query.
It is another object of the present invention to provide techniques for reducing the number of times of communications between the front end side and the database operation side, caused by a routine call in processing related to a query, to achieve a reduction in time required to execute the query, even if a routine to be executed is not determined until the processing related to the query is executed.
It is a further object of the present invention to provide techniques for reducing the number of times of communications between the front end side and the database operation side caused by a routine call in processing related to a query, to achieve a reduction in time required to execute the query, even if the processing related to the query includes nested routine calls.
The above and other objects and novel features of the present invention will become apparent from the description of the following specification and the accompanying drawings.
Among embodiments disclosed by the present invention, a summary of a representative one will be briefly described as follows.
Specifically, in a routine executing method in a parallel database system, within routines (functions and procedures) called in query execution process instructions on a divided data side, a routine containing no database operation statement is executed by a routine calling party, i.e., the divided data side. Stated another way, a front end side creates an execution process instruction on the divided data side such that a routine execution unit on the divided data side is selected for the execution of such a routine containing no database operation statement.
Also, in the routine executing method in a parallel database system, the divided data side is provided with an execution unit for executing a routine containing no database operation statement (a query related to search and update) in its body. Upon defining a routine, a check is made to see whether or not the routine contains a database operation statement in its body, and information indicative of the determination result is stored as one piece of information created by analyzing the definition statement.
Next, when a query statement containing a call to a routine is executed on the divided data side, check information indicating whether or not the routine contains a database operation statement is referenced. If the routine does not contain a database operation statement, the routine is executed on the divided data side.
Since a call unit for calling a routine containing no database operation statement in its body and an execution unit for executing the routine are both resident on the divided data side, no communication is incurred by the execution of the routine. For this reason, the number of times of communications between the front end side and the divided data side is significantly reduced, thereby making it possible to reduce the number of times of communications caused by the execution of a routine containing no database operation statement.
If a routine to be executed is not determined for a routine call until the execution time, check information indicating whether or not a database operation statement is contained in a routine is set as one piece of information associated with execution process instructions created as a result of analyzing the body of the routine. With this information, the routine determined upon execution is checked to see whether or not a database operation statement is contained in this routine, so that an execution unit for the routine can be selected in accordance with the check result.
If a routine call is nested in a body of a routine or a procedure to be executed in the execution unit for executing a routine containing no database operation statement on the divided data side, a routine to be called in the nest is also subjected to the check to see whether or not a database operation statement is contained in this routine.
In this case, a call to the routine to be called in the nest is to be performed on the divided data side. However, if the check information indicating whether or not a database operation statement is contained in the body of the routine called in the nest reveals that a database operation statement is contained in the routine, the execution of the routine is passed to the front end side. Conversely, if no database operation statement is contained in the routine, the routine is executed in the execution unit for executing a routine containing no database operation statement on the divided data side.
Since a communication is eliminated between the call to the routine called in the nest and the routine execution unit, the number of times of communications is significantly reduced in the entire execution of the routine including a nested routine call.
Assuming that the number of routines called by a query (containing no database operation statement in its body) executed on the divided data side is n, and the number of data accessed by the query is m, the numbers of times of communications performed between the front end side and the divided data side, when the present invention is applied and when the present invention is not applied, are as shown in the following Table 1.
Further, when a query call or a routine call is contained in a loop, the number of times of communications results in the product of the total number of times in the above list and the number of times the loop is executed. Since the number of data (records) accessed by a query amounts to several millionsxe2x80x94several tens of millions in many cases in a large scale database handled in a parallel database system, the routine executing method of the present invention provides tremendous benefits.
As described above, according to the routine executing method in a parallel database system, when a routine called on the divided data side does not contain a database operation statement in its body, the routine is executed in the routine execution unit provided on the divided data side. It is therefore possible to reduce the number of times of communications incurred by a routine call and accordingly achieve a reduction in time required to execute a query.