The invention relates to a database system comprising a computer unit, an operational memory and a separate peripheral memory, storing at least one multi-dimensional stock of data in the form of a UB tree, further to a method for operating a database system of this kind to read data and to carry out join operations and further relational-algebra operations.
Databases for description, storage and retrieval of comprehensive stocks of data are known. A database system typically includes in particular a peripheral memory (data base) wherein the stock(s) of data are deposited, and a data management storing the data according to predetermined rules, or retrieving them or performing further operations with the data.
The so-called relational or join operations are operations frequently carried out, in particular in the field of relational databases, wherein at least two stocks of data, called relations, are connected in attribute-oriented manner. Most one-place and two-place operations carried out in databases require the presence of the operand(s) in sorting order of specific attributes. As a result, in the known databases, stored stocks of data must be sorted very frequently and the data must be moved several times between the computer""s main memory and the hard disk or the peripheral memory. As regards multi-dimensional stocks of data in particular, such features entail storing and sorting processes which are costly in capacity and computer time.
In the light of the above state of the art, it is the objective of the present invention to create a database system as well as a method to operate a database system allowing reading, retrieving and joining stocks of data in arbitrary sorting sequence at minimized memory capacity and computer time.
This problem is solved by the invention by a database system having the features of claim 1. By storing the multi-dimensional stock of data as a UB (universal B) tree and by subdividing the UB tree into a predetermined number of sub-spaces and by consecutively processing the sub-spaces, the invention avoids re-sorting with respect to an inquiry attribute.
Organizing and storing a multi-dimensional stock of data in the form of a UB tree to achieve improved access time in particular for online applications together with the ability to dynamically insert and erase data objects is known from the German patent application 196 35 429.3 by the same applicant and also from xe2x80x9cThe Universal B-Tree for Multidimensional Indexingxe2x80x9d by Rudolf Bayer at the Technical University of Munich, Institute of Information Theory, TUM-I9637, November 1996. Indexing and storing a multi-dimensional stock of data are carried out by bounding the stock of data in the form of a multi-dimensional cube subdivided by iterated subdivision in all dimensions into sub-cubes until consecutive sub-cubes can be consolidated into regions each containing a quantity of data objects that can be stored on a memory page of given storage capacity of the storage medium.
If now such a UB tree is sub-divided into a predetermined number of sub-spaces for the purpose of reading, retrieving and/or joining data, then, in the invention, the stored data can be processed by consecutively processing the sub-spaces in the direction of one dimension of the multi-dimensional stock of data without requiring sorting relative to a particular attribute in the peripheral memory.
The invention provides a cache storage to buffer the intersection regions of the sub-space being processed, i.e. the jump regions of the UB tree, until the jump region(s) in the subsequent sub-spaces have been completely processed. This feature offers the further advantage in that, on account of the buffering of the invention in a cache storage, each region need be retrieved only once from the UB tree, and in that regions (jump regions) projecting from the sub-space being processed can be buffered in one or more subsequent sub-spaces until they shall be completely processed. Either the whole jump region can be buffered or only that part of the jump region which projects from the sub-space being processed.
In a further implementation of the invention, a control system emits by default the number and/or the width of the sub-spaces. Such a control system can determine and emit by default the number and/or the width of the individual sub-spaces of a given UB tree. This feature allows optimal sub-division of the UB tree awaiting processing. Advantageously the control system is part of the computer system of the database system.
In the invention, the default of sub-space width is implemented as a minimizing function of the storing capacity of the cache storage. The stock of data stored as the UB tree therefore is subdivided to attain minimal storing capacity of the cache storage.
In another implementation of the invention, the above sub-division is into sub-spaces of equal widths. This feature facilitates consecutive processing of the data sub-chambers and especially processing buffered jump regions or portions of jump regions.
The invention furthermore proposes a method for operating a database system whereby the following stages are provided to read data in an arbitrary sorting sequence:
1. Sub-division of a multi-dimensional stock of data stored as a UB tree into a predetermined number of n sub-spaces,
2. Input of the data of the first sub-space of the UB tree into a main memory,
3. Processing (sorting and/or readying) the data of the sub-space, region by region,
4. Erasing those regions in the main memory that lack cut sets with subsequent sub-spaces,
5. Buffering those regions in a cache storage comprising at least one cut set with one of the subsequent sub-spaces (jump regions),
6. Input of next sub-space of the stock of data into the main memory,
7. Repeating the method steps 3 through 6 until the last sub-space has been processed, where, with respect to each new sub-space, first the jump regions buffered in the cache storage are processed and, unless they cut still further sub-spaces, they will be erased.
The method of the invention makes possible reading and keeping ready data from a multi-dimensional stock of data stored in the form of a UB tree in an arbitrary sorting sequence without requiring pre-sorting or re-sorting. In particular each datum object of the UB tree is fed only once in the method of the invention into the main memory of the database system.
In an especially advantageous solution offered by the invention of the above basic problems, a method is proposed to implement a join operation between two join partners, of which at least one join partner is a multi-dimensional stock of data stored as a UB tree, and comprising the following stages:
1. Subdividing the join partners into n sub-spaces,
2a. Input of the data of the first sub-space of a first join partner into a main memory,
2b. Input of the data of the first region of the first sub-space of a second join partner stored in the form of a UB tree,
3. Processing (finding and assigning) the data of the join partners present in the main memory,
4. Erasing the region in the main memory if this region lacks a cut set with at least one of the subsequent sub-spaces of the UB tree, otherwise buffering the region (jump region) in a cache storage,
5. Input of the next region of the second join partner into the main memory,
6. Repeating the method stages 3 through 5 until the last region of the first sub-space of the second join partner has been processed,
7. Input of the next sub-space of the first join partner and input per region of the next sub-space of the second join partner,
8. Repeating the method stages 3 through 7 until the last sub-space has been processed, where, for each new sub-space, first the buffered jump regions in the cache storage are processed and, if they no longer cut further sub-chambers, they will be erased.
In the invention therefore, two join partners are sub-divided each into a predetermined number n of sub-spaces by sub-dividing both join partners and then are joined to each other by successive assignments of the data in the sub-spaces without requiring pre-sorting or re-sorting of either or both join partners. In this invention, there is regional input of the sub-space of the join partner stored as a UB tree and to be processed, and the data objects contained in the region are processed by being assigned to the data objects of the corresponding sub-space of the join partner of the UB tree and then are erased if lacking a cut set with at least one of the ensuing sub-spaces of the UB tree. Otherwise the region will be buffered as a jump region in a cache storage. As regards each sub-space which is new in the sequence of processing, first the jump regions buffered in the cache storage will be processed and then erased unless they still cut further sub-spaces.
The method of the invention implementing a join operation is applicable to more than two join partners, each further join partner being sub-divided in the same way into a predetermined number n of sub-spaces and being consecutively processed in an appropriate direction by assigning the data objects of one region not only to one join partner but consecutively to several join partners while carrying out any appropriate join operations.