(1) Field of the Invention
The invention generally relates to database administration. The invention relates to a method for the administration of a data base, comprising a document set for reception of at least one referenceable, structured document, which comprises at least one data element, the position of which in the document structure is determined by a referenceable structure path, a schema which comprises a first node, to which further nodes are dependently linkable, so that, from the first node, to each further node of the schema respectively leads a referenceable schema path, and a repository, into which the schema path and, assigned to said schema path, at least one document reference is mappable. The generic method comprises the following steps:                Searching the repository, to check if the structure path reference of the document corresponds to a schema path reference of the schema,and—if so        mapping the documents referenced in the repository of the data base while assigning the document reference to the schema path reference.        
The invention further relates to a method for determining a data base schema, which comprises a first node, to which further nodes are dependently linkable, so that, from the first node, to each further node of the schema respectively leads a referenceable schema path, by means of a set of one or more structured documents.
The invention also relates to a method for mapping a data base schema, which comprises a first node, to which further nodes are dependently linkable, so that, from the first node, to each further node of the schema respectively leads a referenceable schema path, into a repository of the data base, into which the schema path and, assigned to said schema path, at least one document reference is mappable.
Further the invention relates to a data base with a document set for reception of at least one referenceable, structured document, which comprises at least one data element, the position of which in the document structure is determined by means of a referenceable structure path, a schema which comprises a first node, to which further nodes are dependently linkable, so that, from the first node, to each further node of the schema respectively leads a referenceable schema path, and a repository, in which the schema path and, assigned to said schema path, at least one document reference is mappable.
Furthermore the invention relates to a storage means for a storage unit for a computer.
Moreover the invention relates to a computer system, with at least one central processor unit, at least one central processing unit, at least one storage unit being connected with the central processing unit, said storage unit having a storage means for storing data and all commands for the central processing unit, at least one input means, for inputting data and/or commands into the central processing unit and/or for inputting data into the storage unit and at least one output means, for outputting data from the central processing unit and/or the storage unit.
(2) Description of Related Art
One method for the administration of a data base is known from EP 1 089 195 A1. In the data base, data, especially content data and structure data is stored. The data is part of a document, which comprises several data elements and are stored in the data base. Each data element contains content data and/or structure data.
The data base is structured by means of a schema. The schema is defined in a predetermined manner. It describes the data to be expected for the data base. The description does not have to be complete. In the schema several nodes are assigned to each other hierarchically. One single so-called root node provides the highest level of the schema. All other nodes of the hierarchy depend on said root node. A full designation of each node corresponds to a “schema path” from the root node to the designated node. Thus, each fully designated node comprises an information about what is the relationship of the node to other nodes in the hierarchical structure. The data base comprises a repository, into which for every node the respective path from the root node to the node and a path reference which points to said path are mappable.
The known data base is provided for reception of documents, the structure of which is predetermined in a section of the respective document, in which one or more tags are defined. The definition of every tag includes a classification of the respective tag with regard to the totality of the tags of the document so that the structure of the document is determined. The structure of the document is hierarchically formed. A first position in the structure being described by a first tag is thus sub-, co- or superordinate to a second position in the structure being described by a second tag. From the highest level of the hierarchy a path leads to each position in the structure.
The content of the document is contained in one or more data elements. The data elements of the document are arranged structurally, by each data element being marked by one tag respectively, said tag describing the content of the respective data element.
Together each data element and the respective tag of the data element form a pair. Each pair is referenceable by a path reference, which points to the path and thus to the position of this pair in the hierarchical structure. The path reference can be mapped into the repository together with the referenced path to the pair of data element and tag of the data element.
Each document is referenceable by a document reference. The document reference can be mapped into the repository. The repository is formed as central administrative unit of the data base.
The repository is created, namely initialized, when the first document is stored in the data base. The repository gets updated, namely extended, when a further document is stored. Creating respectively updating of the data base is carried out in such a way, that the document which is stored runs through an analysis routine, which arranges the data of the document in pairs of one data element and one tag. Due to the defined position of the tag in the hierarchical structure of the document, each pair is assigned to a respective node in the schema of the data base. If the schema does not yet comprise the node, a path to the node and an assigned path reference are mapped into the repository of the data base. At any rate the document reference of the document is assigned to the path reference of the node and the document reference is mapped into the repository of the data base.
Since the repository comprises the path reference of each pair, it reflects the schema of the data base, i.e. the hierarchical structure of the data base. During the existence of the data base, the path references are so to speak stored in the repository.
In practice, a data base which is administrated on a computer system with a storage unit and a central processing unit can be very large. The data base then comprises such an extensive repository, that searching said repository will require a considerable processing time of the central processing unit of the computer system.
In U.S. Pat. No. 6,240,407 B1 a method is described to create a repository of a data base by means of a structured document. The structured document is analyzed whether it comprises at least one data element. Then the document is abstracted by using pre-determinable abstraction steps to obtain a set of abstracted values. The set of abstracted values is stored in the repository.
The set of abstracted values is smaller compared to the set of data elements, so that the size of the repository is comparatively smaller. A data base, the repository of which is set up with the known method, enables a fully structured search for data elements, said data elements comprising a content which is predetermined by means of a search word and said data elements being marked by means of a tag said tag being predetermined by means of a filter, but the search as regards contents within an abstracted value however requires a larger processing time consumption, because it extends respectively to the quantity of data elements which are registered by the abstraction of the data elements to the abstracted value.
One disadvantage of the known method is, that the size of the repository strongly increases if documents are stored in the data base, which are not fully described by the schema of the data base. Such documents are called “open-content” document. The structure of the “open-content” documents conforms to the schema of the data base in which they shall be stored, but it differs from the schema of the data base in such a way, that it represents an enlargement compared to the schema of the data base. The respective document can then comprise at least one tag to which no node of the schema of the data base corresponds. When the document is stored in the data base the schema has to be supplemented by the respective node and a mapping of the additional node into the repository is necessary. If for this reason, during the administration of the data base, the schema comprises more and more nodes, many applications for technical reasons may not be able to conduct a search because of the required processing time, resulting in the computer system used to administrate the data base to deny service.