This invention relates generally to computer databases. More particularly, this invention relates to a technique for organizing data in databases using compound structures and On-Line Analytical Processing.
On-Line Analytical Processing (OLAP) generally refers to a technique of providing fast analysis of shared multi-dimensional information stored in a database. OLAP systems provide a multi-dimensional conceptual view of data, including full support for hierarchies and multiple hierarchies. This framework is used because it is the most logical way to analyze businesses and organizations.
Unfortunately, it is difficult to handle large volumes of multi-dimensional information in a computer. The first problem is one of size. While small volumes of multi-dimensional data can be handled in Random Access Memory (RAM), this technique does not work for large problems. Multi-dimensional information is typically very large. Because OLAP is used for interactive analysis, it must respond very rapidly to queries, even when the data volumes grow.
The second problem is that multi-dimensional data is almost always sparse. In fact, in large multi-dimensional applications, it is not unusual to have only one cell populated for every million cells that are defined.
The sparsity problem discourages the storage of data in simple, uncompressed arrays, except in certain special cases. The simplest way of dealing with sparse data might seem to be to store only cells containing data in some indexed form. However, this approach has two problems. First, the index and keys are likely to take much more space than the data. Moreover, it is relatively time consuming to search them. Second, access will be inefficient because multi-dimensional data is often clustered, consisting of regions of relatively dense data separated by large, extremely sparse or totally empty sections. As a result, related data cells are unlikely to be placed physically close to each other.
A typical prior art approach to these problems is to break the data into smaller, denser multi-dimensional objects. Some techniques do this implicitly, presenting all the data to the user in what is known as a xe2x80x9chyper-cubexe2x80x9d format in which all the data in the application appears to be in a single multi-dimensional structure. Other techniques do it explicitly in what is known as the xe2x80x9cmulti-cubexe2x80x9d approach, in which the multi-dimensional database consists of a number of separate objects, usually with different dimensions. That is, the database is segmented into a set of multi-dimensional structures, each of which is composed of a subset of the overall number of dimensions in the database. Each segmented structure might be, for example, a set of variables or accounts, each dimensioned by just the dimensions that apply to that variable. It is also possible to identify two main types of multi-cubes. Block multi-cubes use orthogonal dimensions so there are no special dimensions at the data level. A cube may consist of any number of the defined dimensions, and both measures and time are treated as ordinary dimensions, just like any other. Series multi-cubes treat each variable as a separate cube (often a time series), with its own set of distinct dimensions.
In general, multi-cubes are more versatile, but hyper-cubes are easier to understand. End-users relate better to hyper-cubes because of their higher level view. Multi-cubes provide greater tunability and flexibility. In addition, multi-cubes are a more efficient way of storing very sparse data.
It would be highly desirable to develop an OLAP technique that combines the conceptual benefits of hyper-cubes with the processing benefits (e.g., versatility, tunability, and flexibility) of multi-cubes.
A method executed by a computer under the control of a program includes the step of establishing a compound structure in the form of a virtual unit of multi-dimensional storage. The compound structure includes a rack with a horizontal arrangement of target structures linked by an alias backbone representing a dimension of information. The horizontal arrangement of target structures selectively includes further compound structures and base structures containing data, in any combination. The compound structure may also include a stack with a vertical arrangement of racks linked by the alias backbone. The top level rack of the stack has read and write functionality, while the bottom level of the stack has read-only functionality. The compound structure is referenced to obtain information.
The invention provides the benefits of both multi-cubes and hyper-cubes. That is, the invention allows versatile, tunable, flexible and space efficient multi-cubes to be joined together into a compound structure that can be easily comprehended and manipulated by application developers and end users. The invention allows the underlying design of the structures to be changed without disturbing the view of the data which is accessed at the higher level. Another advantage of this architecture is that it provides a completely scalable and adaptable infrastructure. The complete scalability arises from the freedom to divide a multi-dimensional problem into manageable base structures, and then to efficiently process those structures in series or in parallel. That is, the processing of the base structures can be spread over many processors, either inside a single multi-processor system or across a loosely clustered network of machines. Because each of the underlying physical structures can be accessed and manipulated independently, it is possible for each of them to be sent to a different CPU for processing before recombination through the compound structure.
Another advantage of this architecture is that compound structures act as high level indices into the underlying data, which in turn relieves some of the pressure on the indices inside the target structures and gives improved performance. In general, the better management of sparsity and more logical organization of the multi-dimensional storage possible with the invention leads to improved performance.