1. Field of Invention
The present invention relates to a method of and system for aggregating data elements in a multi-dimensional database (MDDB) supported upon a computing platform and also to provide an improved method of and system for managing data elements within a MDDB during on-line analytical processing (OLAP) operations.
2. Brief Description of the State of the Art
The ability to act quickly and decisively in today""s increasingly competitive marketplace is critical to the success of organizations. The volume of information that is available to corporations is rapidly increasing and frequently overwhelming. Those organizations that will effectively and efficiently manage these tremendous volumes of data, and use the information to make business decisions, will realize a significant competitive advantage in the marketplace.
Data warehousing, the creation of an enterprise-wide data store, is the first step towards managing these volumes of data. The Data Warehouse is becoming an integral part of many information delivery systems because it provides a single, central location where a reconciled version of data extracted from a wide variety of operational systems is stored. Over the last few years, improvements in price, performance, scalability, and robustness of open computing systems have made data warehousing a central component of Information Technology CIT strategies. Details on methods of data integration and constructing data warehouses can be found in the white paper entitled xe2x80x9cData Integration: The Warehouse Foundationxe2x80x9d by Louis Rolleigh and Joe Thomas, published at http:/www.acxiom.com.whitepapers/wp-11.asp.
Building a Data Warehouse has its own special challenges (e.g. using common data model, common business dictionary, etc.) and is a complex endeavor. However, just having a Data Warehouse does not provide organizations with the often-heralded business benefits of data warehousing. To complete the supply chain from transactional systems to decision maker, organizations need to deliver systems that allow knowledge workers to make strategic and tactical decisions based on the information stored in these warehouses. These decision support systems are referred to as On-Line Analytical Processing (OLAP) systems. OLAP systems allow knowledge workers to intuitively, quickly, and flexibly manipulate operational data using familiar business terms, in order to provide analytical insight into a particular problem or line of inquiry. For example, by using an OLAP system, decision makers can xe2x80x9cslice and dicexe2x80x9d information along a customer (or business) dimension, and view business metrics by product and through time. Reports can be defined from multiple perspectives that provide a high-level or detailed view of the performance of any aspect of the business. Decision makers can navigate throughout their database by drilling down on a report to view elements at finer levels of detail, or by pivoting to view reports from different perspectives. To enable such full-functioned business analyses, OLAP systems need to (1) support sophisticated analyses, (2) scale to large numbers of dimensions, and (3) support analyses against large atomic data sets. These three key requirements are discussed further below. Decision makers use key performance metrics to evaluate the operations within their domain, and OLAP systems need to be capable of delivering these metrics in a user-customizable format. These metrics may be obtained from the transactional databases precalculated and stored in the database, or generated on demand during the query process. Commonly used metrics include:
(1) Multidimensional Ratios (e.g. Percent to Total) xe2x80x9cShow me the contribution to weekly sales and category profit made by all items sold in the Northwest stores between July 1 and July 14.xe2x80x9d
(2) Comparisons (e.g. Actual vs. Plan, This Period vs. Last Period) xe2x80x9cShow me the sales to plan percentage variation for this year and compare it to that of the previous year to identify planning discrepancies.xe2x80x9d
(3) Ranking and Statistical Profiles (e.g. Top N/Bottom N, 70/30, Quartiles) xe2x80x9cShow me sales, profit and average call volume per day for my 20 most profitable salespeople, who are in the top 30% of the worldwide sales.xe2x80x9d
(4) Custom Consolidations xe2x80x9cShow me an abbreviated income statement by quarter for the last two quarters for my Western Region operations.xe2x80x9d
Knowledge workers analyze data from a number of different business perspectives or dimensions. As used hereinafter, a dimension is any element or hierarchical combination of elements in a data model that can be displayed orthogonally with respect to other combinations of elements in the data model. For example, if a report lists sales by week, promotion, store, and department, then the report would be a slice of data taken from a four-dimensional data model.
Target marketing and market segmentation applications involve extracting highly qualified result sets from large volumes of data. For example, a direct marketing organization might want to generate a targeted mailing list based on dozens of characteristics, including purchase frequency, size of the last purchase, past buying trends, customer location, age of customer, and gender of customer. These applications rapidly increase the dimensionality requirements for analysis.
The number of dimensions in OLAP systems range from a few orthogonal dimensions to hundreds of orthogonal dimensions. Orthogonal dimensions in an exemplary OLAP application might include Geography, Time, and Products.
Atomic data refers to the lowest level of data granularity required for effective decision making. In the case of a retail merchandising manager, xe2x80x9catomic dataxe2x80x9d may refer to information by store, by day, and by item. For a banker, atomic data may be information by account, by transaction, and by branch. Most organizations implementing OLAP systems find themselves needing systems that can scale to tens, hundreds, and even thousands of gigabytes of atomic information.
As OLAP systems become more pervasive and are used by the majority of the enterprise, more data over longer time frames will be included in the data store (i.e. data warehouse), and the size of the database will increase by at least an order of magnitude. Thus, OLAP systems need to be able to scale from present to near-future volumes of data.
In general, OLAP systems need to (1) support the complex analysis requirements of decision-makers, (2) analyze the data from a number of different perspectives (i.e. business dimensions), and (3) support complex analyses against large input (atomic-level) data sets from a Data Warehouse maintained by the organization using a relational database management system (RDBMS).
Vendors of OLAP systems classify OLAP Systems as either Relational OLAP (ROLAP) or Multidimensional OLAP (MOLAP) based on the underlying architecture thereof. Thus, there are two basic architectures for On-Line Analytical Processing systems: The ROLAP Architecture, and the MOLAP architecture.
Overview of The Relational OLAP (ROLAP) System Architecture
The Relational OLAP (ROLAP) system accesses data stored in a Data Warehouse to provide OLAP analyses. The premise of ROLAP is that OLAP capabilities are best provided directly against the relational database, i.e. the Data Warehouse.
The ROLAP architecture was invented to enable direct access of data from Data Warehouses, and therefore support optimization techniques to meet batch window requirements and provide fast response times. Typically, these optimization techniques include application-level table partitioning, pre-aggregate inferencing, denormalization support, and the joining of multiple fact tables.
As shown in FIG. 1A, a typical prior art ROLAP system has a three-tier or layer client/server architecture. The xe2x80x9cdatabase layerxe2x80x9d utilizes relational databases for data storage, access, and retrieval processes. The xe2x80x9capplication logic layerxe2x80x9d is the ROLAP engine which executes the multidimensional reports from multiple users. The ROLAP engine integrates with a variety of xe2x80x9cpresentation layers,xe2x80x9d through which users perform OLAP analyses.
After the data model for the data warehouse is defined, data from on-line transaction-processing (OLTP) systems is loaded into the relational database management system (RDBMS). If required by the data model, database routines are run to pre-aggregate the data within the RDBMS. Indices are then created to optimize query access times. End users submit multidimensional analyses to the ROLAP engine, which then dynamically transforms the requests into SQL execution plans. The SQLexecution plans are submitted to the relational database for processing, the relational query results are cross-tabulated, and a multidimensional result data set is returned to the end user. ROLAP is a fully dynamic architecture capable of utilizing precalculated results when they are available, or dynamically generating results from atomic information when necessary.
Overview of MOLAP System Architecture
Multidimensional OLAP (MOLAP) systems utilize a proprietary multidimensional database (MDDB) to provide OLAP analyses. The main premise of this architecture is that data must be stored multidimensionally to be accessed and viewed multidimensionally.
As shown in FIG. 1B, prior art MOLAP systems have an Aggregation, Access and Retrieval module which is responsible for all data storage, access, and retrieval processes, including data aggregration (i.e. preaggregation) in the MDDB. As shown in FIG. 1B, the base data loader is fed with base data, in the most detailed level, from the Data Warehouse, into the Multi-Dimensional Data Base (MDDB). On top of the base data, layers of aggregated data are built-up by the Aggregation program, which is part of the Aggregation, Access and Retrieval module. As indicated in this figure, the application logic module is responsible for the execution of all OLAP requests/queries (e.g. ratios, ranks, forecasts, exception scanning, and slicing and dicing) of data within the MDDB. The presentation module integrates with the application logic module and provides an interface, through which the end users view and request OLAP analyses on their client machines which may be web-enabled through the infrastructure of the Internet. The client/server architecture of a MOLAP system allows multiple users to access the same multidimensional database (MDDB).
Information (i.e. basic data) from a variety of operational systems within an enterprise, comprising the Data Warehouse, is loaded into a prior art multidimensional database (MDDB) through a series of batch routines. The Express(trademark) server by the Oracle Corporation is exemplary of a popular server which can be used to carry out the data loading process in prior art MOLAP systems. As shown in FIG. 2B an exemplary 3-D MDDB is schematically depicted, showing geography, time and products as the xe2x80x9cdimensionsxe2x80x9d of the database. The multidimensional data of the MDDB is organized in an array structure, as shown in FIG. 2C. Physically, the Express(trademark) server stores data in pages (or records) of an information file. Pages contain 512, or 2048, or 4096 bytes of data, depending on the platform and release of the Express(trademark) server. In order to look up the physical record address from the database file recorded on a disk or other mass storage device, the Express(trademark) server generates a data structure referred to as a xe2x80x9cPage Allocation Table (PAT)xe2x80x9d. As shown in FIG. 2D, the PAT tells the Express(trademark) server the physical record number that contains the page of data. Typically, the PAT is organized in pages. The simplest way to access a data element in the MDDB is by calculating the xe2x80x9coffsetxe2x80x9d using the additions and multiplications expressed by a simple formula:
Offset=Months+Product*(# of_Months)+City*(# of_Months*# of_Products)
During an OLAP session, the response time of a multidimensional query on a prior art MDDB depends on how many cells in the MDDB have to be added xe2x80x9con the flyxe2x80x9d. As the number of dimensions in the MDDB increases linearly, the number of the cells in the MDDB increases exponentially. However, it is known that the majority of multidimensional queries deal with summarized high level data. Thus, as shown in FIGS. 3A and 3B, once the atomic data (i.e. xe2x80x9cbasic dataxe2x80x9d) has been loaded into the MDDB, the general approach is to perform a series of calculations in batch in order to aggregate (i.e. pre-aggregate) the data elements along the orthogonal dimensions of the MDDB and fill the array structures thereof. For example, revenue figures for all retail stores in a particular state (i.e. New York) would be added together to fill the state level cells in the MDDB. After the array structure in the database has been filled, integer-based indices are created and hashing algorithms are used to improve query access times. Pre-aggregation of dimension D0 is always performed along the cross-section of the MDDB along the D0 dimension.
As shown in FIG. 3C2, the primarily loaded data in the MDDB is organized at its lowest dimensional hierarchy. As shown in FIGS. 3C1 and 3C3, the results of the pre-aggregations are stored in the neighboring parts of the MDDB.
As shown in FIG. 3C2, along the TIME dimension, weeks are the aggregation results of days, months are the aggregation results of weeks, and quarters are the aggregation results of inontlis. While not shown in the figures, along the GEOGRAPHY dimension, states are the aggregation results of cities, countries are the aggregation results of states, and continents are the aggregation results of countries. By pre-aggregating (i.e. consolidating or compiling) all logical subtotals and totals along all dimensions of the MDDB, it is possible to carry out real-time MOLAP operations using a multidimensional database (MDDB) containing both basic (i.e. atomic) and pre-aggregated data. Once this compilation process has been completed, the MDDB is ready for use. Users request OLAP reports by submitting queries through the OLAP Application interface (e.g. using web-enabled client machines), and the application logic layer responds to the submitted queries by retrieving the stored data from the MDDB for display on the client machine.
Typically, in MDDB systems, the aggregated data is very sparse, tending to explode as the number of dimension grows and dramatically slowing down the retrieval process (as described in the report entitled xe2x80x9cDatabase Explosion: The OLAP Reportxe2x80x9d, htt://www.olapreport.com/DatabaseExplosion.htm, incorporated herein by reference). Quick and on line retrieval of queried data is critical in delivering on-line response for OLAP queries. Therefore, the data structure of the MDDB, and methods of its storing, indexing and handling are dictated mainly by the need of fast retrieval of massive and sparse data.
Different solutions for this problem are disclosed in the following U.S. Patents, each of which is incorporated herein by reference in its entirety:
U.S. Pat. No. 5,822,751 xe2x80x9cEfficient Multidimensional Data Aggregation Operator Implementationxe2x80x9d
U.S. Pat. No. 5,805,885 xe2x80x9cMethod And System For Aggregation Objectsxe2x80x9d
U.S. Pat. No. 5,781,896 xe2x80x9cMethod And System For Efficiently Performing Database Table Aggregation Using An Aggregation Indexxe2x80x9d
U.S. Pat. No. 5,745,764 xe2x80x9cMethod And System For Aggregation Objectsxe2x80x9d
In all the prior art OLAP servers, the process of storing, indexing and handling MDDB utilize complex data structures to largely improve the retrieval speed, as part of the querying process, at the cost of slowing down the storing and aggregation. The query-bounded structure, that must support fast retrieval of queries in a restricting environment of high sparcity and multi-hierarchies, is not the optimal one for fast aggregation.
In addition to the aggregation process, the Aggregation, Access and Retrieval module is responsible for all data storage, retrieval and access processes. The Logic module is responsible for the execution of OLAP queries. The Presentation module intermediates between the user and the logic module and provides an interface through which the end users view and request OLAP analyses. The client/server architecture allows multiple users to simultaneously access the multidimensional database.
In summary, general system requirements of OLAP systems include: (1) supporting sophisticated analysis, (2) scaling to large number of dimensions, and (3) supporting analysis against large atomic data sets.
MOLAP system architecture is capable of providing analytically sophisticated reports and analysis functionality. However, requirements (2) and (3) fundamentally limit MOLAP""s capability, because to be effective and to meet end-user requirements, MOLAP databases need a high degree of aggregation.
By contrast, the ROLAP system architecture allows the construction of systems requiring a low degree of aggregation, but such systems are significantly slower than systems based on MOLAP system architecture principles. The resulting long aggregation times of ROLAP systems impose severe limitations on its volumes and dimensional capabilities.
The graphs plotted in FIG. 5 clearly indicate the computational demands that are created when searching an MDDB during an OLAP session, where answers to queries are presented to the MOLAP system, and answers thereto are solicited often under real-time constraints. However, prior art MOLAP systems have limited capabilities to dynamically create data aggregations or to calculate business metrics that have not been precalculated and stored in the MDDB.
The large volumes of data and the high dimensionality of certain market segmentation applications are orders of magnitude beyond the limits of current multidimensional databases.
ROLAP is capable of higher data volumes. However, the ROLAP architecture, despite its high volume and dimensionality superiority, suffers from several significant drawbacks as compared to MOLAP:
Full aggregation of large data volumes are very time consuming, otherwise, partial aggregation severely degrades the query response.
It has a slower query response
It requires developers and end users to know SQL
SQL is less capable of the sophisticated analytical functionality necessary for OLAP
ROLAP provides limited application functionality
Thus, improved techniques for data aggregation within MOLAP systems would appear to allow the number of dimensions of and the size of atomic (i.e. basic) data sets in the MDDB to be significantly increased, and thus increase the usage of the MOLAP system architecture.
Also, improved techniques for data aggregation within ROLAP systems would appear to allow for maximized query performance on large data volumes, and reduce the time of partial aggregations that degrades query response, and thus generally benefit ROLAP system architectures.
Thus, there is a great need in the art for an improved way of and means for aggregating data elements within a multi-dimensional database (MDDB), while avoiding the shortcomings and drawbacks of prior art systems and methodologies.
Accordingly, it is a further object of the present invention to provide an improved method of and system for managing data elements within a multidimensional database (MDDB) using a novel stand-alone (i.e. external) data aggregation server, achieving a significant increase in system performance (e.g. deceased access/search time) using a stand-alone scalable data aggregation server.
Another object of the present invention is to provide such system, wherein the stand-alone aggregation server includes an aggregation engine that is integrated with an MDDB, to provide a cartridge-style plug-in accelerator which can communicate with virtually any conventional OLAP server.
Another object of the present invention is to provide such a stand-alone data aggregration server whose computational tasks are restricted to data aggregation, leaving all other OLAP functions to the MOLAP server and therefore complementing OLAP server""s functionality.
Another object of the present invention is to provide such a system, wherein the stand-alone aggregation server carries out an improved method of data aggregation within the MDDB which enables the dimensions of the MDDB to be scaled up to large numbers and large atomic (i.e. base) data sets to be handled within the MDDB.
Another object of the present invention is to provide such a stand-alone aggregration server, wherein the aggregation engine supports high-performance aggregation (i.e. data roll-up) processes to maximize query performance of large data volumes, and to reduce the time of partial aggregations that degrades the query response.
Another object of the present invention is to provide such a stand-alone, external scalable aggregation server, wherein its integrated data aggregation (i.e. roll-up) engine speeds up the aggregation process by orders of magnitude, enabling larger database analysis by lowering the aggregation times.
Another object of the present invention is to provide such a novel stand-alone scalable aggregation server for use in OLAP operations, wherein the scalability of the aggregation server enables (i) the speed of the aggregation process carried out therewithin to be substantially increased by distributing the computationally intensive tasks associated with data aggregation among multiple processors, and (ii) the large data sets contained within the MDDB of the aggregation server to be subdivided among multiple processors thus allowing the size of atomic (i.e. basic) data sets within the MDDB to be substantially increased.
Another object of the present invention is to provide such a novel stand-alone scalable aggregation server, which provides for uniform load balancing among processors for high efficiency and best performance, and linear scalability for extending the limits by adding processors.
Another object of the present invention is to provide a stand-alone, external scalable aggregation server, which is suitable for MOLAP as well as for ROLAP system architectures.
Another object of the present invention is to provide a novel stand-alone scalable aggregation server, wherein an MDDB and aggregation engine are integrated and the aggregation engine carries out a high-performance aggregation algorithm and novel storing and searching methods within the MDDB.
Another object of the present invention is to provide a novel stand-alone scalable aggregation server which can be supported on single-processor (i.e. sequential or serial) computing platforms, as well as on multi-processor (i.e. parallel) computing platforms.
Another object of the present invention is to provide a novel stand-alone scalable aggregation server which can be used as a complementary aggregation plug-in to existing MOLAP and ROLAP databases.
Another object of the present invention is to provide a novel stand-alone scalable aggregation server which carries out an novel rollup (i.e. down-up) and spread down (i.e. top-down) aggregation algorithms.
Another object of the present invention is to provide a novel stand-alone scalable aggregation server which includes an integrated MDDB and aggregation engine which carries out full pre-aggregation and/or xe2x80x9con-the-flyxe2x80x9d aggregation processes within the MDDB.
Another object of the present invention is to provide such a novel stand-alone scalable aggregation server which is capable of supporting MDDB having a multi-hierarchy dimensionality.
Another object of the present invention is to provide a novel method of aggregating multidimensional data of atomic data sets originating from a RDBMS Data Warehouse.
Another object of the present invention is to provide a novel method of aggregating multidimensional data of atomic data sets originating from other sources, such as external ASCII files, MOLAP server, or other end user applications.
Another object of the present invention is to provide a novel stand-alone scalable data aggregation server which can communicate with any MOLAP server via standard ODBC, OLE DB or DLL interface, in a completely transparent manner with respect to the (client) user, without any time delays in queries, equivalent to storage in MOLAP server""s cache.
Another object of the present invention is to provide a novel xe2x80x9ccartridge-stylexe2x80x9d (stand-alone) scalable data aggregation engine which dramatically expands the boundaries of MOLAP into large-scale applications including Banking, Insurance, Retail and Promotion Analysis.
Another object of the present invention is to provide a novel xe2x80x9ccartridge-stylexe2x80x9d (stand-alone) scalable data aggregation engine which dramatically expands the boundaries of high-volatility type ROLAP applications such as, for example, the precalculation of data to maximize query performance.
Another object of the present invention is to provide a generic plug-in cartridge-type data aggregation component, suitable for all MOLAP systems of different vendors, dramatically reducing their aggregation burdens.
Another object of the present invention is to provide a novel high performance cartridge-type data aggregration server which, having standardized interfaces, can be plugged-into the OLAP system of virtually any user or vendor.
Another object of the present invention is to provide a novel xe2x80x9ccartridge-stylexe2x80x9d (stand-alone) scalable data aggregation engine which has the capacity to convert long batch-type data aggregations into interactive sessions.
These and other object of the present invention will become apparent hereinafter and in the claims to Invention set forth herein.