The present invention relates to database management techniques, and more particularly to a database management system which is applicable to a parallel database management system having a function of executing a program module incorporated therein by a user.
The present invention utilizes the following three known techniques related to a database management system (hereinafter abbreviated as "DBMS") for managing a database (hereinafter abbreviated as "DB"):
(1) Parallel DB Processing; PA1 (2) SQL3; and PA1 (3) Object Relational DBMS.
In the following, these three known techniques will be briefly described.
(1) Parallel DB Processing
This is a method of parallelly executing database processing which satisfies a user's request using a plurality of processors for processing user's queries involving a large amount of data. An example of this method is described in JP-A-8-137910 (Reference 1). In the method of Reference 1, a processor receives a user's query, and a DBMS controls the executions of a plurality of engine processors (execution servers) such that a load is optimally distributed among the engine processors.
(2) SQL3
SQL3 is a draft of a database language specification for which the International Standard Organization (ISO) is currently working for standardization. For example, according to "Information technology--Database languages--SQL--Part 2: SQL/Foundation" ISO/IEC JTC1/SC21 N10489 (Reference 2), SQL3 permits a description as follows:
CREATE TYPE sgmltext_t ( text BLOB, . . . 1 FUNCTION extract (sgmltext_t, . . . 2 VARCHAR) RETURNS BLOB LANGUAGE C . . . 3 EXTERNAL NAME `p_sgml_extract`; );
This description provides definition statements for an abstract data type (hereinafter abbreviated as "ADT"). 1 in the definition statements indicates that ADT sgmltext_t is composed of a component of BLOB (Binary Large Object) type referenced by a name "text."
Also, 2 in the definition statements indicates that an ADT function extract( ) can be applied to data having an ADT sgmltext_t type.
Further, 3 in the definition statements indicates that the ADT function extract( ) is related to an external function labelled p_sgml_extract described in C language.
The user can define his inherent data type using the ADT as described above, thereby realizing functions corresponding to data access, inheritance and so on by methods in a general object-oriented program language.
(3) Object Relational DBMS
They say that a conventional relational DBMS (hereinafter abbreviated as "RDBMS") based on a relational data model is not suitable for handling data having a complicated structure such as multimedia data because it cannot provide close representations of such data and also implies other problems on performance. For this reason, an object relational DBMS (hereinafter abbreviated as "ORDBMS"), which introduces an object orientated concept into RDBMS, has been proposed as described in "Object Relational DBMSs" written by Michael Stonebraker, translated by Yoshinobu Ohta, and published by International Thompson Publishing Japan, August 1996 (Reference 3). Reference 3 mentions as a basic requirement of ORDBMS that ORDBMS should be capable of handling complicated objects. Reference 3 also mentions that the ORDBMS should be able to use the following user defined types and user defined functions:
 create type phone_t ( area varchar(3), . . . 4 number varchar(7), . . . 5 description varchar(20)); . . . 6
This description provides definition statements for a user defined complex type phonhd --t. The definition statements indicate that the complex type phone_t is composed of three components: a variable character string type element of three bytes or less referenced by a name "area" (4 in the definition statements); a variable character string type element of seven bytes or less referenced by a name "numbers" (5 in the definition statements); and a variable character string type element of 20 bytes or less referenced by a name "description" (6 in the definition statements).
An example of definition statements for a user defined function is shown in the following:
create function Northness-equal (point, point) returns Boolean with selfunc=selectivity_comp external name `/usr/Northness_equal` language C; 7
This description provides definition statements for a user defined function Northness_equal( ). 7 in the definition statements indicates that the user defined function Northness_equal( ) is associated with an external function labelled /usr/Northness_equal described in C language. As to an external function, Reference 3 describes that good ORDBMS should be able to dynamically link a user defined function so as not to consume an address space of DBMS for nothing until the user defined function is required. Such user defined type and user defined function can be used in correspondence to ADT and ADT function described by SQL3, respectively.
The present inventors have found the following problems as a result of investigating DB systems utilizing the known techniques described above.
First, a conceptual diagram representing an exemplary configuration of a conventional DB system is illustrated in FIG. 1. The illustrated DB system 100 is a system for managing documents described in SGML (Standard Generalized Markup Language). A DBMS 120 for managing the DB system 100 comprises a request reception server 130 for receiving a query 104 from a user; a plurality of execution servers 140-1-140-n for executing database processing in accordance with instructions from the request reception server 130; and a single dictionary server 160 for managing definition information of the system 100, and the DBMS 120 is adapted to control general parallel DB processing. These servers are interconnected through a communication path 180.
Assume that a definition for management of SGML document, subjected to DB processing by the DBMS 120, is described by SQL3 in the following manner:
 CREATE TYPE sgmltext_t ( text BLOB, . . . 8 FUNCTION extract ( sgmltext_t, VARCHAR) . . . 9 RETURN BLOB LANGUAGE C EXTERNAL NAME `p_sgml_extract`; . . . 10 ) ; CREATE TABLE reports ( published_date DATE, contents sgmltext_t); . . . 11
The user of this DB system 100 will issue a desired query for data in a DB described in SGML (hereinafter abbreviated as "SGML text"), using the ADT sgmltext_t type.
8 in the description statements indicates that the ADT sgmltext_t type has text of BLOB type as a component.
11 in the description statements represents the structure of data corresponding to report in the user's DB model using a table reports. More specifically, in correspondence to the "report" comprising "published date" and "reported contents" as its components, the table reports is defined to comprise a DATE type column published_date and an ADT sgmltext_t type column contents.
For processing a large amount of SGML documents in parallel, a record 152-1 in the table reports and a SGML text 154-1 are held in storage devices 150-1-150-n respectively accessed by the execution servers 140-1-140-n. For rapidly searching for "report" with a condition defined by "published date," a column published--date of the table reports is indexed using a general indexing function provided by the execution servers.
9-10 in the description statements define an ADT function extract( ) which is a function for extracting text data delimited by tags (156, 158 in FIG. 1) from the SGML text 154-1, and requires the following two input parameters:
(1) Original SGML text from which text data is extracted; and
(2) a tag name for specifying a portion to be extracted.
10 in the description statements is an external function p_sgml_extract( ) which is defined as a function for realizing an ADT function extract( ). An object code 144-1 for realizing the external function p_sgml_extract( ) is included in a plug-in program module (hereinafter a "plug-in module") 142-1. The plug-in module 142-1 is a program module incorporated in the execution server for realizing a SGML document data management function of the DB system 100.
In this example, control information based on document structure information on SGML documents is used for performing partial extraction of the SGML text 154-1 delimited by specified tags 156, 158. This control information includes structural information for structuring a partially extracted data as a SGML document, and is indispensable information for creating an extraction result. The control information for the partial extraction processing is called "extraction parameters." The extraction parameters are based on the SGML document structure, and are commonly utilized for SGML texts having the same SGML document structure. In this DB system 100, the extraction parameters are collectively managed in the system by the dictionary server 160.
The dictionary server 160 holds the extraction parameters 172 in an associated storage device 170. The structure of SGML documents in the DB is permanently represented by a column for holding a SGML text such that the format or document structure of the "reported contents" in the "reports" is fixed. Accordingly, the extraction parameters are also permanently represented by a column for holding a SGML text to be processed. Thus, the dictionary server 160 manages the extraction parameter 172 on the basis of table names and column names so that each of the execution servers 140-1-140-n can acquire the extraction parameters 172.
With the configuration described above, the partial extraction processing is executed for the SGML text in accordance with the following procedure.
(1) Based on the table name and the column name of a column in a table which holds a target SGML text to be handled, an access to the dictionary server 160 to acquire extraction parameters is carried out on an execution server.
(2) The partial extraction processing utilizing the extraction parameters acquired in step (1) is carried out on an execution server. The execution of steps (1), (2) in this procedure is controlled by a plug-in module 144-1.
Next, description will be made on a search operation on the DB system 100 including the partial extraction processing of a SGML text.
For example, a search request from the user requesting to "extract abstracts of reports, the published date of which is later than Oct. 15, 1996" may be described by SQL3 in the following manner.
SELECT extract(contents, `abstract`) PA0 FROM reports PA0 WHERE published--date&gt;`1996-10-15` PA0 (STEP 1) A request reception server 230 acquires extraction parameters 272 from a dictionary server 260. An external function 234 for acquiring the extraction parameters 272 from the dictionary server 260 is provided by a plug-in program module 232, and the request reception server 230 calls the external function 234. PA0 (STEP 2) The request reception server 230 transmits the extraction parameters 272 together with an execution instruction to respective execution servers 240-1, 240-2, . . . , 240-n. PA0 (STEP 3) An external function 244-1 for executing extraction processing on each execution server (e.g., 240-1) executes the extraction processing with reference to the extraction parameters 272 transmitted thereto from the request reception server 230. The external function for executing the extraction processing using the extraction parameters 272 as input parameters is provided by a plug-in module 242-1. The execution server 240-1 passes the extraction parameters 272 transmitted from the request reception server 230 as input parameters for the external function 244-1, when it calls the external function 244-1.
Database processing appropriate to this search request is executed in the following procedure:
(1) A set of records in reports satisfying the conditions defined by the WHERE phrase are acquired using an index set to the column published_date in the table reports.
(2) Based on the set of records acquired in step (1), SGML texts are sequentially retrieved from the contents of records in reports. Then, an external function p_sgml_extract( ) for realizing the ADT function extract( ) is called to extract abstracts.
In this procedure, the processing at step (2) for sequentially retrieving SGML texts to extract the abstracts is executed by each of the execution servers in consideration of efficient utilization of the parallel processing function for faster processing, and a reduction in the amount of data transferred to the request reception server for making up a search result.
Each execution server calls the external function p_sgml_extract( ) for partially extracting abstracts, and passes the execution control to the plug-in module 144-1. The plug-in module 144-1, to which the execution control has been passed, accesses the dictionary server 160 to acquire the extraction parameters 172, and executes the extraction processing utilizing the extraction parameters 172.
The system illustrated in FIG. 1, however, implies the following problems.
In the database processing for the foregoing search, all of plug-in modules on a plurality of execution servers 140-1, 140-2, . . . , 140-n, running in parallel, make an access to the single dictionary server 160, so that the processing for retrieving the extraction parameters 172 is intensively executed in the dictionary server 160.
In the conventional processing scheme illustrated in FIG. 1, the parallel processing for distributing a load adversely affects with respect to the access to the dictionary server 160. Specifically, as the number of execution servers is larger, the dictionary server 160 suffers from a larger load, and consequently, the search processing capabilities of the entire system are degraded due to a limited performance of the dictionary server 160.
Also, for sequentially extracting records satisfying the condition defined by the WHERE phrase, as in the aforementioned query statements, the dictionary server 160 is burdened with a load larger than the actual number of execution servers.
To solve the problem mentioned above, the following method is taken into account.
The scheme illustrated in FIG. 1 causes the problem because plug-in modules are executed on a plurality of execution servers 140-1, 140-2, . . . , 140-3 so that the single dictionary server 160 is intensively accessed by these plug-in modules.
The plug-in modules on the plurality of execution servers individually access the dictionary server 160 because they intend to acquire the extraction parameters 172 required for the extraction processing. However, the extraction parameters 172 required during the processing for the query are the same in either of the execution servers. In addition, the extraction parameters 172 need not be acquired by directly accessing the dictionary server 160 from their execution environments on the respective execution servers. Therefore, if all the execution servers are allowed to reference the extraction parameters 172 acquired from the dictionary server 160 by any means, the execution servers can individually execute the extraction processing without accessing the dictionary server 160.
To realize the concept mentioned above, the present inventors have devised a method processed by a procedure as illustrated in FIG. 2. This procedure will be described below.
However, the aforementioned three known techniques cannot control the execution of the plug-in modules 232, 242-1 in accordance with the procedure described above, if they are used without any modifications.
It is further desirable that the user can specify the control for the execution of plug-in modules as mentioned above. Thus, the inventors directed their attention to a method of utilizing an interface definition language (IDL) which is described in "The Common Object Request Broker: Architecture and Specification" OMG Document Number 91.12.1, Revision 1.1 (Reference 4) as a prior art technique related to the specification of a function definition.
This method defines an interface between modules with the IDL in a software architecture called "CORBA." The interface is associated with a programming language such as C language or the like, and a module for connection called "stub" is produced. A flexible inter-module communication is enabled through this stub module. However, the specifications of the IDL described in Reference 4 do not permit the user to directly specify to control the execution of external functions as mentioned above. The inventors added modifications to the specifications of the IDL to permit the user to directly specify to control the execution of external functions as mentioned above.