1. Field of the Invention
The present invention relates to an apparatus and method for retrieving a desired structured document from a structured document database having a hierarchical logical structure which stores a plurality of structured documents having different document structures.
2. Description of the Related Art
Nowadays, along with the advance of IT (information technology), a huge volume of information can be easily acquired. On the other hand, required information is buried in a large volume of information, and cannot be fully utilized. Even when a large volume of information is present, there is no significance in it unless it can be used efficiently.
Hence, an activity that accumulates data, which are important for corporate management, of know-how and job data possessed by specific persons and departments, and utilize them as “management resources”, i.e., knowledge management, has been proposed.
For example, some documents such as patent specifications, weekly reports, and the like have predetermined formats, and are standardized to these formats. In addition to such documents which are standardized to given formats, many documents with free formats are also present.
Hence, in order to realize knowledge management, a database which can store and manage both documents with predetermined document structures, and other documents with free formats is required.
XML (Extensible Markup Language) is expected to be the core technology of the next-generation knowledge management.
An XML document is data having a tree structure. An XML database that stores and manages XML documents has a tree-like hierarchical data structure that manages components of a plurality of structured documents to be managed as those which form a document structure of one, giant structured document. That is, the XML database stores XML documents as one tree-like, giant XML document. Each component on this hierarchical structure can be specified by a “pass”. The pass is a means for indicating a specific area (location) on the XML database. Using this pass, a partial XML document can be accessed from the XML database.
The document structure of an XML document stored in the XML database need not always be defined by a schema. If the schema is defined, only one schema per database is allowed. That is, if no schema is used, documents with different document structures can be stored and managed together. However, if one schema is set, documents with document structures different from that defined by the schema cannot be stored together.
Jpn. Pat. Appln. KOKAI Publication No. 7-56786 “structured document management apparatus” has proposed a method that sets one schema in an XML database, and stores and manages only documents that match the set schema.
On the other hand, a technique called OLAP (Online Analytical Processing) is known. This is an analytical application which allows an end user to directly retrieve and aggregate data in a database so as to find out problems and solutions. With this technique, for example, a huge volume of sales information for respective shops can be analyzed while being instantaneously switched across various dimensions such as products, areas, years, and the like.
As a database to which OLAP is applied, a multi-dimensional database that can switch a plurality of attribute items (dimensions) in turn and can retrieve and calculate data has received a lot of attention. Upon selecting two arbitrary attribute items, data is immediately displayed in a two-dimensional table format (map).
However, the multi-dimensional database must manage a large volume of data, since aggregated results corresponding to all combinations of attribute items are prepared in advance. Also, since this database has a unique data structure, dedicated client software is required.
By contrast, a relational database (RDB) manages normalized data using a two-dimensional table format, and generates one table of an aggregated result by combining a plurality of tables required for analysis. Since data are coupled every time the viewpoint changes, a long response time is required.
In order to store and manage a huge number of structured documents of different document structures on a database, when a specific document structure is predetermined for a specific type of documents, it is convenient to standardize all documents of such type to an identical document structure for the purpose of data manipulations such as retrieval and the like.
However, there is no conventional XML database which can store and manage documents of different document structures while maintaining consistency of document structures corresponding to different document types. That is, the conventional XML database can store and manage documents that match one schema, but cannot store and manage documents corresponding to a plurality of different schemas together with documents which do not correspond to any schemas.
A plurality of databases which correspond to a plurality of schemas may be prepared. In this case, if the schema is different, the database to be accessed is different. For this reason, access to a huge number of documents with various document structures cannot be uniformly made, and it is difficult to retrieve and extract a related information group from a huge number of pieces of various kinds of information.
As described above, conventionally, since documents having various document structures defined in advance in correspondence with the types of documents cannot be simultaneously managed together with structured documents which have no document structure, it is impossible to retrieve and extract a related information group from a huge volume of various kinds of information by uniform access to various document structures irrespective of a specific document structure.
Hence, conventionally, it is difficult to easily implement OLAP that displays the retrieved and aggregated results of data from a huge number of structured documents having various document structures, while switching a plurality of attribute items (dimensions) in turn.
Once the attribute item has changed to change the analytical viewpoint, retrieval/aggregation operations must be redone by re-setting the attribute item in a retrieval condition. Since such process is required, it is not easy to switch the display window for displaying the retrieved/aggregated result using an attribute item set based on a given viewpoint to that from another viewpoint.
In this manner, in order to implement OLAP, every time the analytical viewpoint changes, the display window of the data retrieved/aggregated result using the changed attribute item must be switched, but it is not easy to attain such window transition that involves retrieval and aggregation processes.