1. Field of the Invention
The present invention generally relates to a database building technique, and more specifically, to a technique for retrieving data in a versatile manner from a database constructed in such a way as to include an unnormalized data structure.
2. Description of the Related Art
In a relational database (hereinafter referred to as xe2x80x9cRDBxe2x80x9d) which is today""s dominating database, data modeling is performed by assuming that data to be processed is normalized (namely, data redundancy is eliminated).
Normalized data can be easily retrieved by using a data manipulating language (hereinafter referred to as xe2x80x9cDMLxe2x80x9d) such as SQL. Further, many general-purpose retrieval tools have been put to practical use. However, in actual RDBs, it is difficult to achieve complete data normalization. Moreover, actual RDBs contain many unnormalized or deorganized data. Hereinbelow, conditions for normal forms according to a relation theory in RDB are shown.
(A1) Individual elements of a relation bear no relationship with one another and are atomic (first normal form condition).
(A2) Any attribute other than keys of a relation should be provided with values of all the keys when a value of the attribute is uniquely determined (second normal form condition).
(A3) When one Y of attributes X and Y of a relation is determined if the other X of the attributes is determined, the attribute X should be a key for the attribute Y (third normal form condition).
However, if these conditions are rigorously applied to a very large database, the number of necessary tables increases. Moreover, the number of joins for joining tables at a retrieving operation increases. This results in extremely reduced retrieval speed. Thus, usually, a database design permits a database to contain some unnormalized data. Hereinbelow, examples are shown wherein RDB is permitted to contain unnormalized data.
(B1) In case of retrieving data all over storage areas containing data on departments and fields, which have different data structures, respectively.
(B2) In case of hierarchically categorizing data into major, intermediate and minor classes so as to treat many kinds of data.
(B3) In case of partially accumulating data with a high frequency of use in advance so as to increase retrieval efficiency, and providing the accumulated data repeatedly.
(B4) In case of performing special processing on a small number of pieces of exceptional data according to a branch No., a flag and an identifier.
Especially, under the present conditions, the technique of employing an unnormalized data structure, which is hierarchically categorized, is heavily used as techniques of easily imparting a (non-integral) fractal dimension to the data space without impairing the whole data structure, differently from a normalized data model which handles only integral dimensions such as two-dimension and three-dimension of a data space.
Meanwhile, a logic, on which a query statement for retrieving unnormalized data is based, in the foregoing RDB has a characteristic that it is difficult to describe this logic according to a first order predicate logic assumed by ordinary DML. Practically, SQL, which is the most standard DML, has a part of multi-order logic functions such as a sub-query and HAVING clause. However, SQL has drawbacks in that these functions are weak and submit to many constraints and that logical prospects are poor. Further, the actual usage frequency of SQL is not high.
Therefore, in case of retrieving unnormalized data, it is difficult to utilize an existing general-purpose database retrieving tool which depends upon language functions attached to a database. Under the present circumstances, application programs for individual databases should be separately developed. Alternatively, primitive techniques should be employed. For example, after raw data to be processed is extracted, a user should process the raw data. Thus, there have been caused the problems that the retrieval of data requires a great deal of labor and cost and that a long processing time is required to obtain a result of the retrieval.
Further, an object-oriented database (OODB) obtained by encapsulating data and algorithms so as to be integral with one another is sometimes used to enable a local operation on data. However, even in case of employing OODB, as an amount of data is increased, a processing efficiency is reduced. Moreover, an operation of converting a data structure requires a great deal of time and effort. Thus, it is difficult to make OODB practical as a very large database.
Furthermore, in case a user directly designates complex and hard-to-understand unnormalized data stored in a database as objects to be retrieved and retrieval conditions, it is desirable that such data is represented as data of a simple data structure like a table image. Thus, in the field of OLAP (online analytical processing), an approach, by which a source data structure itself is normalized in a multi-dimensional space, is employed. Such an approach, however, has the problems that a revision of the existing data structure and data conversion require enormous work and that the entire structure is frequently changed owing to the necessity of exceptional data and retrievals thereof. Thus, such an approach is not effective in all situations.
The foregoing problems can be solved if a data structure including physically unnormalized data (hereinafter referred to as xe2x80x9cunnormalized data structurexe2x80x9d) can be presented to a user as a logically normalized data structure (hereinafter referred to as xe2x80x9cnormalized data structurexe2x80x9d).
Therefore, it is an object of the present invention to provide a data retrieving method which enables data retrieval, in a versatile manner, from a database which is built so as to include an unnormalized data structure.
It is another object of the present invention to provide a data retrieving apparatus which is suitable for implementing the foregoing data retrieving method.
It is another object of the present invention to provide a storage medium for realizing the foregoing data retrieving method and apparatus using a general-purpose computer apparatus.
According to one aspect of the present invention, there is provided a data retrieving method comprising the steps of providing data definition information for representing an unnormalized data structure contained in a database as a logically normalized data structure; analyzing a retrieval request from a user according to the data definition information so as to convert the retrieval request into a query statement which is executable by the database; executing the query statement relative to the database so as to obtain a retrieval result; and editing the obtained retrieval result according to the data definition information so as to generate a final retrieval result corresponding to the retrieval request.
It may be arranged that the retrieval request includes a data extracting condition for restrictively specifying data to be retrieved, and that the query statement, which is executable by the database, is obtained by normalizing a logical description in the data extracting condition by a first-order predicate logic according to the data definition information.
It may be arranged that the final retrieval result is obtained by eliminating data redundancy included in the retrieval result obtained by executing the query statement.
According to another aspect of the present invention, there is provided a data retrieving apparatus comprising a database built so as to include an unnormalized data structure; a data dictionary holding definition information for representing the unnormalized data structure as a logically normalized structure; a retrieval request input section for assisting an input of a retrieval request from a user according to the data dictionary; a retrieval request translation section for analyzing the retrieval request according to the data dictionary upon completion of the input of the retrieval request so as to convert the retrieval request into one or more query statements which are executable by the database; a retrieval processing section for issuing the one or more query statements to the database so as to acquire retrieval data composed of one or more results of execution of the one or more query statements; and a retrieval data processing section for editing the retrieval data so as to generate result data corresponding to the retrieval request.
It may be arranged that the data dictionary includes the definition information to which display information selectably presented to the user when assisting the input, a query statement pattern for converting the retrieval request to the query statement which is executable by the database, and physical information in the database are related for each of logical items representing a data unit in the retrieval request.
It may be arranged that the data dictionary includes a logical sub-item for complementing the logical item, and that information on categories in a logical item type for representing the unnormalized data structure as the logically normalized data structure is defined in the logical sub-item.
It may be arranged that the retrieval request input section visibly presents the display information on data to be retrieved, according to the data dictionary, and interactively performs the selective input from the user of the retrieval request which includes an extraction condition for extracting the data to be retrieved.
It may be arranged that the retrieval request input section designates a display format of the result data, the display format included in the retrieval request.
It may be arranged that the retrieval request input section generates the result data from a plurality of display items, which are selected by the user from the display information, such that the plurality of display items employing a common display item as a key item are compounded into the result data.
It may be arranged that the retrieval request input section dynamically adds information on the logical item type used for a logical description in an extraction condition to the data dictionary.
It may be arranged that the retrieval request translation section performs a syntax analysis of the retrieval request according to the data dictionary to replace a retrieval request syntax pattern of the retrieval request with a corresponding query statement pattern which is executable by the database.
It may be arranged that the retrieval request translation section converts the retrieval request into the one or more query statements which are normalized according to a first order predicate logic in a data manipulation language based on predetermined SQL.
It may be arranged that the retrieval request translation section expands all the display items included in the retrieval request according to the data dictionary and converts the expanded display items into corresponding logical items, respectively, and that the retrieval request translation section removes data redundancy in the retrieval request by performing one of column integration for integrating different physical items, which compose the physical information, into a same logical item according to an extraction condition, column decomposition for decomposing a same physical item into different logical items, row selection for selecting a single logical item, whose content does not overlap with those of other items, from physical items whose contents overlap with those of other physical items, and key value selection for selecting different physical items according to a key value and aggregating the different physical items to a single logical item, and converts the retrieval request into the one or more query statements normalized according to a first order predicate logic in a data manipulation language based on predetermined SQL.
It may be arranged that the retrieval data processing section removes a redundant part by integrating items formed from a same logical item in the retrieval data corresponding to the one or more query statements.
It may be arranged that the retrieval data processing section performs data processing and tabulation on the retrieval data according to designation of a display format concerning the result data when the designation of the display format is included in the retrieval request.
It may be arranged that the database is constructed as a predetermined object-oriented database.
According to another aspect of the present invention, there is provided a data retrieving system comprising the data retrieving apparatus according to claim 4 and a plurality of retrieval request source devices, wherein the data retrieving apparatus is bidirectionaly communicably connected to the retrieval request source devices, and wherein the data retrieving apparatus is constituted so as to acquire a retrieval request issued from each of the retrieval request source devices and transmit a corresponding retrieval result to the corresponding retrieval request source device.
It may be arranged that the retrieval request is inputted to the data retrieving apparatus through an agent function.
According to another aspect of the present invention, there is provided a storage medium storing a program which is executable by a computer apparatus comprising a database built so as to include an unnormalized data structure, and a data dictionary holding definition information for representing the unnormalized data structure as a logically normalized structure, the program causing the computer apparatus to execute the steps of: assisting an input of a retrieval request from a user according to the data dictionary: analyzing the retrieval request according to the data dictionary upon completion of the input of the retrieval request so as to convert the retrieval request into one or more query statements which are executable by the database; issuing the one or more query statements to the database so as to acquire retrieval data composed of one or more results of execution of the one or more query statements; and editing the retrieval data so as to generate result data corresponding to the retrieval request.