The invention relates to a database system, in particular to a database system in which a database comprising a plurality of data sets is stored, further a method for storing, amending, and displaying the database, as well as a method for searching in respectively for accessing to the database.
A database system usually comprises a stock of data, the so-called database, which is stored in a machine-readable storage, as well as a data processing system on which one or more application programs are running, in order to be able to access the database by means of a control program, the so-called database management system, as well as to be able to display components of the database, to search for components, and to modify the database, to input new data, etc.
The interpretation of the data in the database is done by means of metadata, which are xe2x80x9cdata about the dataxe2x80x9d, said metadata being required by said database management system and being separately stored by said system. The metadata are data descriptive for the fields of the data sets and for all other attributes of the database system.
In all known database systems the database is stored in a structured form, according to the state of the art in form of a table. The database thereby comprises individual data sets, which again consist of individual data fields. If the information in the data sets is dividable into several fields, with the meaning of the n-th field in each data set being identical, then this is storage in a structured form. The contents of the fields may be different.
An example for such a structured database in the form of a table is shown in FIG. 2.
The individual data sets are the rows of the table, in which in respective individual data fields information is contained about receipt number, date, first name, last name, and type of the receipt. The last field of each data set contains a set number, which is an identification number to unambiguously identify the data set. All data sets not only have the same number of data fields, the data fields of each data set also can be categorized according to a structure which is common to all data sets.
For example, each data set contains a field which may be described or identified as receipt number, another one as the date, another one as the first name, etc. The individual data sets are put together into a table the columns of which are characterized by a common description of their content. The structure of the table therefore is prescribed by the descriptions which are present in the header of the table of FIG. 2. The fields of the table header contain further information which is not shown about the content of the individual columns of the table, so-called metadata for the individual fields of the data sets of the table. Basically, in conventional database systems, data of the database are stored in such a structured form, which means there is a prescribed table structure into which the data sets to be stored have to fit with respect to the number of fields and with respect to the description of the field content, however, also with respect to the data type or the data structure of the content of the field.
Because of the prescribed structure according to which the data have to be stored, it is not possible in conventional database systems in case of a structure according to FIG. 2 to store further information which does not fit into this structure. For example, if only for some of the data sets additional information is to be stored, then for this purpose the structure of all of the stored data has to be changed. An example is shown in FIG. 3.
In order to further store a department identifier, even if this information is to be stored only for some of the data sets, an additional column xe2x80x9cdepartmentxe2x80x9d has to be inserted into the table.
Therefore, in conventional database systems, there is a prescribed structure which is mandatory for the individual data sets, and the individual data sets then are combined into a table as shown in FIG. 2 or in FIG. 3. Thereby there exists a prescription for the individual data sets with respect to their description but also with respect to their data format by the overall structure of the table which is formed by the data sets. With respect to their description, their data format, so to speak with respect to their metadata, the contents of the individual fields of the data sets may only be amended by amending the overall structure of the table as a whole. However, such an amendment of the structure of the whole table, like the amendment from FIG. 2 into the one of FIG. 3 influences the structure of all, also the already existing data sets.
The data sets being combined into a table are called the database table. The sum of the metadata of all database tables of the database, which themselves may be individually different, form the so-called data model of the database system.
This kind of structured storage of the data means that for each storage of data sets it is required that the data sets which are to be stored with respect to their structure fit at least into a partial structure of the database.
Therefore, if a data set is to be stored the structure of which is not in accordance with the existing data structures or, with parts thereof, then the database has to be extended by the new structure element (field/fields) before the data set can be stored.
The data structure resulting from the tables not only is formed in the database itself but also is reflected in the database management system as well as in the application programs. For this reason the following problems arise.
Requests for amendments with respect to the data structure are input by the users and therefore at first are forwarded to the application software. However, this then leads also to an amendment of the data model of the database. Furthermore, the amendments of the user requirements have to be defined by the user, and if they are realized they lead to amendments in the database management system, since this system has to be adapted to the structure of the database. This results in a continuing degeneration of the data structures. For example, it may happen that when inserting new columns, not all fields of the column are written into, thereby some of the fields of the whole table which is formed of the individual data sets remain empty.
A particular problem is the storage of data sets having a new data set structure. This data set structure has to be defined by the user. For this purpose amendments in the database management system as well as in the application programs are necessary.
The structured storage of the data in the form of a table therefore leads to a lot of work to be done by the system administrator as a result of the requirements or the demands set forth by the users towards the database. The structure of the database has to be adapted and amended continuously.
In conventional database systems the data sets are ordered in the form of a table. Each data set consists of several data fields. Each data field consists of a descriptor and a field value (field content). The descriptor thereby describes the field value. This means that the descriptor describes the fields with respect to the type respectively the meaning of their field values. For example, all data fields, the contents of which are xe2x80x9cbody sizexe2x80x9d have the same descriptor. Based on these structured data sets then the structured database is formed in the form of a table. This conventional method requires confirmed or hypothetic information about the structure of the data sets to be stored. These information are required in order to define the data model which enables the storage of all data sets, which potentially have to be stored based on the existing information. To store the usually inhomogenious overall set of expected data sets, database tables are defined, whereas each database table only stores data sets having the same structure. The sum of all definitions of all database tables forms the abstract data model of the database system.
It is readily apparent that the data model of each database table has to take into account all data fields of all the data sets which potentially have to be stored in this database table, in particular with respect to their data format. In order to do so for each descriptor which may potentially occur in a data set, the attributes are described. Attributes thereby describe the data format and the type of use (e.g. xe2x80x9cto be indexedxe2x80x9d or xe2x80x9cnot to be indexedxe2x80x9d) of the field value. Thereby the so defined database table is capable of storing data sets which are in accordance with the predefined structure. This data model then may be visualized by assigning a descriptor to each column of the database table. A data set then corresponds to a row in this database table. In each column the field value which corresponds to the descriptor is written if the data set contains a corresponding field value. Not necessarily there has to be a field value for each descriptor in each data set. This leads then to data fields in the database table which remain empty.
This type of storage of data sets in a so formed database system has the disadvantage that at first a data model has to be defined which takes into account each descriptor which may potentially occur in a data set as well as its data format. However, such a data model can only be built if complete and correct information exists about the structure of the data sets which are to be expected. This precondition can be fulfilled only in very rare cases.
The information turning out when analyzing the problem usually is not capable of fully describing the data sets which actually have to be stored. By defining a data model an abstraction has to be made, which means simplifying assumptions have to be made which possibly lead to an incomplete data model. This incomplete data model then possibly has to be modified, extended, simplified, or amended in any other way, at a later stage. Furthermore, when analyzing a problem not necessarily all aspects which influence the data model are taken into account. This also leads to an incomplete data model.
Another reason for an incomplete data model relates to the data model itself being of necessarily statistical nature. If at a certain time complete information about the structure of the data sets to be expected is given, then a complete data model may be developed. However, the information on which the data model is based may change from time to time, which then again leads to an incomplete data model.
The system deficiencies resulting from an incomplete data model lead to requests for amendment from the users. These requests for amendment from the users then result in an improvement of the data model being necessary which has to be carried out by the system administrator/programmer. The so improved data model may then possibly be incomplete again so that a further improvement has to be carried out.
An attempt to minimize the necessary workload is the use of a development framework (e.g. 4GL-language) which is as comfortable and as powerful as possible. Amendments in the database management system then can be executed more easily.
However, the basic problem of the conventional approaches remains the same: a static data model has to be defined which as time passes by has to be adapted to the requirements of the users by continued amendments. These continued amendments in the long run lead to a degeneration of the data model. Despite the amendments being possibly carried out automatically from the point of view of the system administrator, this is not suitable to solve the problem itself.
It is therefore an object of the present invention to provide a method by which the data of a database can be stored in an optimized manner. Furthermore, fast access to the database should be possible, and it also should be possible to display and if necessary to modify the data of the database in an easy and flexible manner.
The core of the invention consists in the fact that when storing a database the formation of a structured data model for the database is dispensed with, and instead the database is stored in a manner which does not make the definition of the data model for the purpose of storage in the database system necessary anymore. Neither manually nor by an automatically executed algorithm data models for the database system are generated.
This is accomplished by the fact that with storage of each data set complete information about the structure of the data set is stored together with the data set. This means that each data set additionally to the user data (or useful data) contains a complete description of the structure of the data set. Thereby the storage of the data set becomes independent of a superordinate or overall data model since the data set has not to be adapted to the structure of an existing or prescribed data model. Each data set contains its complete metadata which contain a complete description of the data set. Thereby the limitations are overcome which result from the fact that in conventional database systems data sets of the same kind are combined into tables through which a data model is described to which newly inserted data sets have to be adapted. Each data set is with respect to its structure completely independent of the other stored data sets and thereby the definition of a superordinate overall data model becomes obsolete.
The lack of a superordinate data model in which several data sets of the same structure are combined leads to a significant reduction of the necessary administrative efforts for the stored data. For example, the database management system does not have to reflect a superordinate data model and therefore does not have to be adapted to an amended superordinate data model if amendments or storage of new structures are carried out. In the database according to the present invention there is no superordinate structure with a plurality of combined data sets, but only a minimum structure which consists of the individual data set itself which at the same time contains a complete description (metadata) about itself respectively about its structure. Each data set consists of an arbitrary number of fields, whereas, however, contrary to the conventional database, each field of a data set of the database according to the present invention in addition to the possibly several fields contains a complete description (metadata) of the field contents.
Each field can contain several field contents, where then the field description fully and completely describes all field contents with respect to their format. In particular, a first field content may consist of user data, while a second field content of the same field may consist of a descriptor which describes the user data with respect to their meaning. The field description then contains the attributes for the user data as well as for the descriptor, which means for both field contents. The attributes thereby describe the individual field contents with respect to their data structure, which means for example with respect to their data type and their length in bytes. They also may define whether a field content is a descriptor or user data. Basically the number of field contents of a field is arbitrary, important is the fact that the field description contains the attributes for all field contents. Which of the possibly several fields contents finally consists of user data and which of a descriptor for one or more field contents consisting of user data may thereby also be determined in the description. However, this may also be simply determined by convention with respect to the order, for example in such a way that every second field content always consists of the descriptor for its preceding field content.
As a matter of course, individual field contents may remain empty despite them being defined in the field description as part of the field. In principle according to the present invention the contents of the field are neither with respect to their number nor with respect to their format determined by a superordinate structure. Format and number of the field contents as well as the question whether the field contents are user data or descriptors may be chosen freely in an arbitrary manner for each field of each data set since the field description describes the field with respect to these determinations completely by means of the attributes contained in the field description. This results in a symmetry or duality between descriptors and user data in the configuration of a data field as well as in the configuration of a data set. This duality or symmetry may be continued by treating user data and descriptors in the same manner also during the further processing of the data sets or the individual data fields, which means when indexing, searching, or accessing fields or data sets, or when presenting and modifying fields or data sets. Even parts of the field description may be included in this equal treatment, the parts of the field description then being treated in a similar manner as the field contents.
In the database according to the present invention there is no superordinate or overall structure for storing similar data sets, but rather a minimum structure which cannot be further simplified for storing the fields which consist of the metadata of the field and of the field contents, which again may be user data or descriptors. Each data set consists of an arbitrary number of fields. The descriptors of the fields of a data set not necessarily have to be unique. Multiple identifications may be simply reflected in the database according to the present invention by storing multiple fields having the same descriptor in one data set.
A descriptor assigns to its corresponding field content consisting of user data a semantic meaning, which means it characterizes the field content consisting of user data with respect to its meaning. In the database according to the present invention a field content consisting of a descriptor is handled in principle in the same manner as a field content consisting of user data, which means both may consist of arbitrary bit sequences of an arbitrary format since their format respective their structure are defined in the corresponding field description. For the database system according to the present invention when processing field contents it does not matter whether the field content consists of user data, like a bit sequence representing an image, or consists of a descriptor which characterizes the bit sequence representing an image with respect to its meaning as an xe2x80x9cimagexe2x80x9d. The user data value as well as the descriptor in principle may consist of arbitrary bit sequences. Only when the data are to be displayed to the user the application program has to distinguish between descriptor and user data value in order to enable the user to recognize wether the information is a user data value or information which characterizes a user data value (or a user data bit sequence) with respect to its meaning. It is therefore necessary that somehow it is determined whether a field content consists only of user data or of a descriptor, besides that, however, every field content is arbitrary in its structure and only has to be defined by the field description in a sufficient manner.
When processing the data sets stored according to the invention, it therefore does not matter whether the field contents to be processed are user data or whether they are descriptors which characterize other field contents with respect to their meaning. All field contents may be processed in the database system according to the invention in substantially the same manner, and in particular they may be indexed in the same manner.
Also hierarchical data structures may be represented in a data set by incorporating into a data set fields as well as sub-data sets. The sub-data sets themselves may again consist of fields and further sub-data sets. On the lowest hierarchical level, however, a sub-data set only consists of fields. Thereby a hierarchical data set finally consists only of fields, which may be stored (in the database according to the invention) in the minimum structure which cannot be further simplified.
The storage of the fields contained in a data set is performed two-fold for the fields belonging to the index. At first the data set as a whole is stored in an unstructured memory (the data set region) with a unique data set number being assigned thereto. A data set stored in the data set region thereby contains the metadata as well as the field contents, which may be user data, descriptors, or descriptors of descriptors. All fields or selected fields only further are stored in a memory which corresponds to the universal minimum structure, the index region. The minimum structure consists of the descriptor, the user data value, the data set number (by means of which the whole data set can be referred to respectively by means of which the fields with the same data set number may be brought together in the index), as well as of an identifier for the access protection (called UIP).
The index region thereby serves as an access path to the data set region in which the whole data set is stored. In the index therefore not necessarily all information has to be stored, but only the information which is necessary for the access to the data set region. In particular, not all information contained in the descriptor has to be stored into the index region.
To enable a fast access to the data region, the index region is in most cases sorted according to the user data value. However, also other ways of sorting may be desirable, for example according to parts of the descriptor. The database according to the invention thereby not only enables an access to the data sets through the user data value, but also through the descriptors.
The duality of user data value and descriptor contained in the universal minimum structure of a field thereby is used also for the access to the stored data sets.
In a further particular embodiment the database comprising the data set region and the index region may be subdivided into sub-data stocks, in order to simplify the administration of a possibly very large stock of data. Such a division may be carried out based on the content of special fields.
In a further particular embodiment the subdivision of the database is performed according to the date of creation of a data set. Thereby a database which is growing endlessly in time may be realized.
In a further particular embodiment the field description contains information about the data type, the length of the data, as well as about the fact whether the respective field content is user data or is a descriptor. The attributes which thereby are stored for each field relate to the field contents consisting of user data as well as to the ones consisting of descriptors. Thereby it is in particular possible that the field descriptors may be interpreted like user data. This data set then is stored as a whole with an associated data set number in a storage, thereby the data set number being a particular field content of a field of the data set, and this field content being characterized in that the descriptor xe2x80x9cdata set numberxe2x80x9d exists only for a single field in the data set.
The field description may occasionally contain further information, for example about the protective status of the user data as well as the descriptors, which means about the access rights to this data. The user data of the individual field of the data sets then are stored in a logical list as a tuple together with the data set number and the corresponding descriptor, possibly also together with further additional information like the protective status. This logical list is sorted according to at least one criterion and serves as overall-index for the access to the sets of the database. This overall-index thereby is indexed over at least the field contents consisting of user data, arbitrarily additionally over the descriptors. By means of this one-fold or two-fold indexed overall-index access to or search for individual user data respectively descriptors is enabled. The one-fold or two-fold indexing of the list can be extended also to further information, for example by indexing also parts of the field description (for example the data type) so that it is also available as an access criterion. The storage of information about the protective status of the individual fields (UIP user information protection) thereby enables the protection of individual fields (user data and descriptor) from the access through users to whom no access right is granted already on the level of the fields or the descriptors, respectively.
In a further particular embodiment the indexed overall-index may be divided into partial indices. Thereby a division of the possibly very large stock of data into sub-data stocks which can be more easily administrated and which may have assigned an external name to it becomes possible.
In a further particular embodiment the division of the overall-index into partial indices is carried out according to the period of time of the creation and/or the use through the user, an information which arbitrarily may be stored together with the information about the protective status as a part of the field description.
By indexing not only the field contents consisting of user data but possibly also the descriptors and/or parts of the field description in the linear list the database enables the presentation of and the access to respectively the search for the field descriptions in the same manner as for the user data and the descriptors.
According to the invention every data set is stored individually together with information about its structure. Additionally particular (if desired even all) fields of the so stored data sets are indexed into a list. Therefore a partly redundant storage of the data, once as a structured data set and a second time as individual fields (partly or complete) in the indexed list is performed. During the second storage of the fields of the data sets in a list by means of the field description it may be determined which fields, which field contents, possibly also which parts of the field description, are to be indexed. These determinations thereby also are stored in the field description.
The fact that the structured data sets also have to be stored, however, does not mean that a data model has to be formed. Every structured data set is stored in its own right, independently of the data sets which are additionally contained in the database. Thereby every data set may contain data of an arbitrary format since it is independent of the other data sets.