The present invention relates to a database management system. More particularly, the present invention relates to a query processing method and system that may be adapted advantageously to parallel query processing suitable for relational database management systems.
There are two known conventional database management systems: Abstract Data Type of SQL3, and parallel database systems.
The Abstract Data Type of SQL3 will be described first. Currently, relational database systems, particularly SQL database systems, are increasingly finding their way into administrative data processing and other related fields. Efforts are also under way to implement so-called object database systems. These systems are difficult to operate efficiently in the conventional framework of relational databases, handling data of complicated structures as one of their objectives.
Studies have been performed to expand relational databases so that they may handle data of complex structures. The results of the research are currently merged into SQL3 for standardization. A SQL3 database system is capable of handling user-defined data (types) having complex structures called the Abstract Data Type (ADT). Under the ADT scheme, data called attributes (called sub-data hereunder) are hidden behind function interfaces so that a database system may handle and inherit object-oriented complicated data from one data type to another
A data type is defined by use of a definition SQL statement beginning with "CREATE TYPE." Defined types may be used for variable declarations and table column definitions just like the system-defined integer type and character type. Using these types makes it possible to create and utilize data of complicated structures. The ADT of SQL3 is discussed illustratively by Andrew E. Wade, Ph. D., in "Object Query Standards" ACM SIGMOD Record, Vol. 25, No. 1, pp. 87-92, March 1996. A draft of SQL3 standards is available from ISO/IEC JTC1/SC21/WG3 DBL-MCI-004, ISO Working Draft Database Language SQL, 1996.
Parallel database systems will now be described. A relational database system may easily enhance its performance by having data distributed to a plurality of database processing servers for parallel access therethrough. Along with skyrocketing increases in quantities of data to be handled has come a growing demand for such parallel database systems. Parallel database systems are discussed illustratively by DeWitt, D., et. al., in "Parallel Database Systems: The Future of High Performance Database Systems" (CACM, Vol. 35, No. 6, 1992).
Typically, the parallel database system includes a server (called a front end server) for analyzing and compiling queries made by a user application program (UAP) in a host computer to the database, and a plurality of servers (called database operation servers) for accessing disk drives holding data and for manipulating data therein. For purpose of simplification and illustration, the parallel database system is assumed in the description below to have a single front end server and a plurality of database operation servers. However, it is also possible for the system to have a plurality of front end servers for handling a plurality of queries from one or multiple hosts. In such a case, each query is dealt with by a setup of one host, one front end server and a plurality of database operation servers. Thus, the system is perfectly applicable to the multiple front end server setup as well.
Generally, the SQL representing queries directed at a database (called database queries hereunder) is often embedded (called embedded SQL) in computer language such as C (called the host language hereunder). In operation, the host language in the host computer issues database queries such as searches through the database as well as updates, deletions and insertions in the database. In turn, the database system analyzes and compiles the queries and returns the result to the host. The host language uses the received result in control processing such as decisions as well as in data manipulations such as insertions and calculations. This invention also applies to stored procedures whereby database queries including control operations and data manipulations are issued.
In this case, the processing performed on the side of database operation servers such as searches, insertions, updates and deletions in the database is sometimes called database operation statements, distinct from the processing on the side of the front end server such as control operations and data manipulations. Stored procedures are described illustratively by Hatsuko Katayama in "Utilizing Stored Procedures and Triggers," Nikkei Open Systems, No. 2, pp. 133-144, 1993.
The host language may have a plurality of database queries embedded therein. This allows the result of queries to be exchanged through the use of host language variables. The transfer of values in variables is determined by the manner in which the result of analysis is processed by the host language. One way of value transfer through variables involves determining in advance an area in which to accommodate a value for each of the variables used, with appropriate information "bound" to the variables so that the areas may later be referenced through the bound information for access to the values therein.
What follows is a description of a typical example in which the embedded SQL is used to create, transfer and execute procedures in an internal format of a parallel database system. The result of database operations is manipulated and/or controlled by UAP (user application program) control statements written in the embedded SQL. Database queries are sent one statement at a time over a network to the front end server connected with the user application program. A compiler subjects the queries to syntax analysis, semantic analysis and optimizing compilation. This creates an internal-format procedure for carrying out actual database operations based on the received database queries.
The internal-format procedure includes codes to be interpreted and executed by an interpreter as well as execute form codes. Definition information needed for the compilation is provided as dictionary information that may be accessed by the front end server. The procedure thus created is transferred over the network to a database operation server that actually operates the database by executing the procedure. Usually the server that performs the actual database operation is determined by information about partitioning of a table for operations. The information about table partitioning is designated by table definitions which are placed in a dictionary. Each database operation server has a processor and at least one disk drive.
A proposed improvement involves placing the internal-format procedure in a cache of a database operation server, so that a second and subsequent queries performed may each issue an execution request to use the procedure that resides in the cache. The typical parallel database system has a plurality of database operation servers for parallel SQL processing. A database operation server exchanges data with other database operation servers as needed over the network, and the result of the SQL processing is eventually returned via the front end server to the user application program for manipulations of and/or control operations on the executed result. The processing is repeated thereafter for each SQL statement.
Where data to be handled in database queries are a collection of a plurality of data items (called attributes or sub-data items in the ADT environment), there are two ways to process the data (i.e., for searches, updates, insertions, manipulations or control operations): either to deal with the sub-data items making up the data one at a time, or to process the data as a whole. In an example of using database queries, the data as a whole may be retrieved first in response to the initial query and the retrieved data may be submitted, in response to subsequent queries, to the front end server one sub-data item at a time for individual processing. In such a case, the subsequent queries may or may not use all sub-data items constituting the retrieved data. However, where techniques of the conventional parallel database system described earlier are adapted to the ADT, what happens is that the data retrieved as a result of a query and bound for the host are all transferred from the database operation server in question to the front end server for subsequent analysis and execution. If the unused sub-data items are massive (e.g., those of LOB data), the time required to transfer such unnecessary data from the database operation server to the front end server can be considerable. Thus it takes more time to deal with queries.