Business decisions and corporate-client relationships have evolved over the past decades. New tools for processing the wealth of data and information have been deployed to exploit business data. Knowledge-based decision support systems have become highly specialized. In addition to relational databases, business managers and decision makers now look to decision support systems (DSS) and other advanced analytical tools in the hope of obtaining a competitive edge.
In a DSS, the basic capabilities of querying and reporting functions is extended by On-line Analytical Processing (OLAP), allowing a robust multidimensional understanding of the data from a variety of perspectives and hierarchies in a multidimensional database. OLAP operations such as drill-down, roll-up and pivot provide insights into business growth, spending, and sales patterns that would simply not be possible otherwise. Other OLAP functionality includes operations for ranking, moving averages, growth rates, statistical analysis, and “what if” scenarios. This discovery process may be further automated in data mining applications, so that trends and patterns can be retrieved with minimal user input. The patterns, for example, may consist of subtle regularities that cross hierarchical and/or dimension boundaries and, as such, would be less likely to be discovered otherwise.
Multidimensional databases used in DSS typically view data as a multidimensional structure called cube. A multidimensional database comprises a collection of related cubes.
Dimensions, as an essential and distinguishing concept in multidimensional databases, are used for selecting and aggregating data at the desired level of detail.
However, the data to be analyzed often have up to 20 or more dimensions, making computations extremely complex and costly. As the dimensions increase, and the number of members of each dimension increases, the number of cells increases dramatically. The number of cells in many cubes representing a business process in a medium or large company is often too large to provide a fast and efficient calculation.
Many of the cells in a cube can be interconnected by formulas. Cells representing profit, for example, are calculated by the difference between corresponding cells representing revenue and corresponding cells representing expenses. Cells representing a year are computed as the sum of corresponding cells representing months which in turn are computed as the sum of corresponding cells representing days. When a change is made to the value of an existing cell of a cube or a new cell is created, the values of many other dependent cells need to be recalculated. For example, recording the sale of a specific product by a specific seller on a specific day to a specific customer will cause a change in the values of a plurality of dependent cells. It is rarely true that the values of all cells of a cube are needed at any time. Many cells are simple empty. For example, no sale was made on a given day of a given product by a given salesperson to a given customer by a given sales channel. Additionally, any specific change, while affecting some number of other cells will not affect all of the cells.
Hence, there are two challenges with multidimensional databases. One is the size of the storage space that needs to be allocated. If the number of existing input and computed values is large, it is not desirable to load all existing values into the computer main memory to compute the new state of the system caused by a small number of additions or changes to cells. The second problem is the speed of read and write access. In addition to performing the required calculations, accessing the desired cells in the vast array of data in a database can add significantly to the time taken to process a query. The number of cells is often larger than that can be accommodated by the main memory. Only a subset of these cells are required for any query or update action. It is expensive to read cell values into main memory. It is desirable to be able to quickly determine that minimum subset of cells required to answer a query or update the database.
Therefore, there is an unmet need to provide systems and methods which determine the minimum number of existing cell values which must be retrieved or changed to recompute the new state of the multidimensional database.