The present invention relates to techniques applicable to a database processing method and an apparatus for carrying out the same. Further, the invention is concerned with a medium storing the database processing method in the form of a program executable with a computer.
In the real world, there exist many situations in which multivalued dependency information has to be handled, i.e., a plurality of information must be handled for a given item or instance. In the case of the relational database, it is a conventional approach to handle such sort of information in terms of normalized data models. In other words, the information or data is processed on the basis of the relations found between a plurality of tables and a query therefor.
For having better understanding of the present invention, description will first be made of a conventional database processing method by referring to FIG. 1 of the drawings. In this figure, there are shown, by way of example, two normalized tables, i.e., a student grade list table 212 containing normalized information concerning grades of students and a student list table 211 containing information concerning E-mail addresses of the students and others. As can be seen in the figure, in both of the student list table 211 and the student grade list 212, information concerning the registered ID numbers are entered, wherein relation is established between the student list table and the student grade list table with the aid of the registered ID number which thus serves as a key. As can readily be appreciated, the information (registered ID number) serving as the key presents redundancy. Consequently, the database capacity for affording such redundancy is required.
In conjunction with the example illustrated in FIG. 1, it is not supposed, by way of example, that a guide for English conversation school is to be dispatched to the students whose grades are marked by scores not greater than xe2x80x9c60xe2x80x9d and hence the E-mail addresses of these students have to be acquired from the database. In that case, the statement of query for the student list table as well as the student grade list table will become much more complicated, as mentioned below. Furthermore, because combine processing is required, the time involved in the processing will increase.
SELECT students-list.E-mail-address
FROM students-list, grade
WHERE
grade.subject=xe2x80x98Englishxe2x80x99 AND
grade.mark less than =60 AND
students-list.registered-ID-number=grade.registered-ID-number
As is obvious from the above, the approach based on the normalized data model requires increasing of the database capacity because the registered ID numbers are redundantly entered in both the tables. Additionally, the query processing becomes complicated. Also, because the combine processing is required, the time taken for the search processing will increase.
On the other hand, execution of the database search processing with an unnormalized table can be conceived as another approach. In FIG. 1, a student grade list table 213 is shown which has been realized by unnormalizing the two tables mentioned above. In this case, the query statement can be simplified as follows.
SELECT DISTINCT E-mail-address
FROM grade
WHERE subject=xe2x80x98Englishxe2x80x99 AND mark less than =60
However, in the unnormalization approach mentioned above, the data other than those of the columns xe2x80x9csubjectxe2x80x9d and xe2x80x9cmarkxe2x80x9d (i.e., the data of registered ID number, the names, the subjects and the E-mail addresses) presents redundancy, as a result of which capacity or volume of the database is increased. The query statement is certainly simplified. However, the number of items entered in each row increases. Although the combine processing can be spared, redundancy has to be suppressed with xe2x80x9cDISTINCTxe2x80x9d, which presents an obstacle in realizing high-speed search and retrieval.
Concerning the database query language SQL (Structured Query Language), a collection type SQL has been adopted as one of the functions for realizing object-oriented extension with a view to allowing sets of information to be handled with high efficiency. Furthermore, it is known to store a plurality of information as one column data similarly to the array by using repetitive columns. For more particular in this respect, reference should be made to HITACHI: xe2x80x9cGuidance for XDM E2-Series Program Creations (XDM/RD E2)xe2x80x9d, p. 27 (1997).
In both the collection type database and the repetitive column type database, no relations are established between/among the elements in the columns. Consequently, any attempt to structurize an application program for processing the query containing a plurality of conditions, as described above, will be forced to resort to such an approach which is based on the relations among a plurality of normalized tables or alternatively such an approach based on the unnormalized table, as described above, which will however incur increasing of capacity of the database, complication of the query statement and eventually degradation of performance of the search processing.
In the light of the state of the art described above, it is an object of the present invention to solve the problems mentioned above and provide techniques or technology with which access to a database designed for managing data in terms of sets of mutually comparable instances can be realized in a suitable or optimal manner.
In view of the above and other objects which will become apparent as the description proceeds, there is provided according to an aspect of the present invention a database processing system which is designed to handle data of such data type which includes sets of instances and which allows the individual instances to be definitely determined with subscripts. For accessing the database processing system according to the invention, a query is used which contains a predicate or alternatively a function which includes at least two designations for the column data of the data type mentioned above and designation of conditions concerning the column data. Upon reception of the query, the database processing system analyzes the query to make decision as to whether or not an index has been generated for the column data specified in the predicate or alternatively by the function.
In the case where the index is generated availably, then an execution procedure is determined such that the index is accessed. Upon database processing, the index is accessed in accordance with the above-mentioned execution procedure to thereby acquire an identifier of table data having a set of instances which meet the conditions specified in the predicate or designated by the function and which can be identified by same subscript.
On the other hand, unless the index is generated, an execution procedure for accessing the table data is determined. Upon database processing, the table data is accessed in accordance with the execution procedure mentioned just above to acquire a set of instances which can be identified by a same subscript from the instances which constitute the column data of the data type specified in the predicate or by the function, whereon the individual instances of the set acquired from at least two column data of the aforementioned data type are evaluated in view of the conditions specified in the predicate or designated by the function.
The index mentioned above includes as the index constituting columns at least two columns (also referred to as the column data) of the data type which is constituted by set(s) of instances and which allows the individual instances to be identified by the subscript(s), and the index is composed of index entries including the set of instances identified by same subscript among the instances constituting the column data of the data type, column data other than the column data of the data type and row identifiers for identifying row data, respectively. Upon execution of table data insert, delete or update processing, the index entries are generated for maintaining the index to be valid.
As will be appreciated from the above, with the database managing system according to the present invention, a set of information of a given data type can be stored as column data in a table for management, and a plurality of elements of the column can be evaluated in terms of a set. Thus, the capacity of the database oriented for storage of tables can be reduced to a possible minimum while ensuring a high-speed search for the set handling columns with a simplified query description.
Parenthetically, it should be added that the database processing system according to the present invention may be implemented in the form of a method, apparatus, program or a medium recording the program for practical applications.