1. Field of the Invention
The present invention relates generally to data storage in computer systems, and more specifically to a method for optimizing data storage and retrieval in database systems.
2. Description of the Prior Art
With the continued increase in computer system processing power and data storage capabilities, increasingly large databases are being made widely available. Simple database designs which work for small databases are generally unsuitable for very large databases, because access times tend to increase geometrically with the size of the database. Even with greatly enhanced processing power, times to access very large databases can become unmanageably long unless care is taken in design of the database.
In many large database systems, it is common to perform many more read accesses of the database than updates. For example, corporate current and historical sales information may be widely available within the company, with constant read access to the database provided. However, updating the data stored within the database can occur relatively less frequency. In systems having this type of access pattern, it is important to minimize the read access time in order to enhance overall system performance.
One database design in common use at the present time is OLAP. This database design technique can be run on several different types of underlying database engines, such as those commercially available from Hyperion, Oracle, and i2 Technologies. In addition to 1-dimenstional OLAP database structures, of particular interest to the present invention are MOLAP (Multidimensional OLAP) systems.
In such systems, some types of data, often referred to as aggregated data, can be pre-computed in order to improve read access times. Data that is suitable to be aggregated is that wherein, in a multidimensional database, data for a higher level may be generated using the data for lower level members within the structure. For example, unit sales of a product can be aggregated in a database which defines multiple sales territories within a sales region; the aggregate unit sales for the region is the sum of the sales for the individual territories.
In order to improve read access times, such aggregatable data can be precomputed, and stored in various locations within the database. In the above example, the region sales data for a product can be stored within the region entry, so that it can be directly accessed at run time by reading each of the sales territory numbers once, and storing the sum in the corresponding region entry. It is not necessary to access each of the sales territories when total sales for the region is desired.
MOLAP engines traditionally precompute all such aggregatable data. This leads to fast read access, but requires a relatively large amount of storage space. In general, the number of storage locations needed is equal to the number of elements in the cross-product of all of the members for each dimension. For example, if a two-dimensional database has 1,000 members in a region dimension, and 1,000 members in a products dimension, 1,000×1,000=106 intersections that must be precomputed. Storage of aggregateable data into every one of these intersections is expensive in terms of space.
In addition, whenever updates are performed to the database, precomputing all intersections affected by the update is relatively expensive in terms of processing time. Although the principal goal of the design is to minimize read access, doing so at the expense of extremely long write access times for updates lowers overall system performance.
Performing run time calculations minimizes the time required to update the database, but can be very slow at read time because of the large number of read accesses which are needed to calculate the number.
In order to optimize system operation, a number of techniques have been implemented in MOLAP systems which provide that only selected intersections are precomputed, while others are computed at run time. One approach is to store only a single data value if a parent and child intersection, or node, have a one-to-one relationship. Another approach is to have a user define, in advance, which nodes should be precomputed, and which should be computed on the fly at run time. This second approach provides the possibility of optimizing a database based upon the particulars of an application, but requires a certain amount of knowledge by an administrative user. It also becomes very difficult or impossible to implement when multiple dimensions, each having a large number of members, are incorporated in the database. In addition, when changes are made to the database, a user is required to again make a series of optimizing determinations.
It would be desirable to provide a multi-dimensional database system which had the capability to provide a reasonable optimization of a space-time trade off for a multi-dimensional database. It would be further desirable for such system and method to be relatively easy to define on the part of a user, and to adapt automatically to changes made to the database.