1. Field of Invention
The present invention relates to an improved method and a system for managing data elements in a multi-dimensional database (MDB) supported upon a parallel computing platform using improved address data mapping (i.e. translation) processes, and more particularly, to an improved method of and a system for managing data elements within a MDB during on-line analytical processing (OLAP) operations.
2. Brief Description of the State of the Art
The ability to act quickly and decisively in today s increasingly competitive marketplace is critical to the success of organizations. The volume of information that is available to corporations is rapidly increasing and frequently overwhelming. Those organizations that will effectively and efficiently manage these tremendous volumes of data, and use the information to make business decisions, will realize a significant competitive advantage in the marketplace.
Data warehousing, the creation of an enterprise-wide data store, is the first step towards managing these volumes of data. The Data Warehouse is becoming an integral part of many information delivery systems because it provides a single, central location where a reconciled version of data extracted from a wide variety of operational systems is stored. Over the last few years, improvements in price, performance, scalability, and robustness of open computing systems have made data warehousing a central component of Information Technology CIT strategies. Details on methods of data integration and constructing data warehouses can be found in the white paper entitled Data Integration: The Warehouse Foundation by Louis Rollleigh and Joe Thomas, published at http://www.acxiom.com/whitepapers/wp-11.asp.
Building a Data Warehouse has its own special challenges (e.g. using common data model, common business dictionary, etc.) and is a complex endeavor. However, just having a Data Warehouse does not provide organizations with the often-heralded business benefits of data warehousing. To complete the supply chain from transactional systems to decision maker, organizations need to deliver systems that allow knowledge workers to make strategic and tactical decisions based on the information stored in these warehouses. These decision support systems are referred to as On-Line Analytical Processing (OLAP) systems. OLAP systems allow knowledge workers to intuitively, quickly, and flexibly manipulate operational data using familiar business terms, in order to provide analytical insight into a particular problem or line of inquiry. For example, by using an OLAP system, decision makers can slice and dice information along a customer (or business) dimension, and view business metrics by product and through time. Reports can be defined from multiple perspectives that provide a high-level or detailed view of the performance of any aspect of the business. Decision makers can navigate throughout their database by drilling down on a report to view elements at finer levels of detail, or by pivoting to view reports from different perspectives. To enable such full-functioned business analyses, OLAP systems need to (1) support sophisticated analyses, (2) scale to large numbers of dimensions, and (3) support analyses against large atomic data sets. These three key requirements are discussed further below.
Decision makers use key performance metrics to evaluate the operations within their domain, and OLAP systems need to be capable of delivering these metrics in a user customizable format. These metrics may be obtained from the transactional databases precalculated and stored in the database, or generated on demand during the query process. Commonly used metrics include:
(1) Multidimensional Ratios (e.g. Percent to Total)
Show me the contribution to weekly sales and category profit made by all items sold in the Northwest stores between July 1 and July 14.
(2) Comparisons (e.g. Actual vs. Plan, This Period vs. Last Period)
Show me the sales to plan percentage variation for this year and compare it to that of the previous year to identify planning discrepancies.
(3) Ranking and Statistical Profiles (e.g. Top N/Bottom N, 70/30, Quartiles)
Show me sales, profit and average call volume per day for my 20 most profitable salespeople, who are in the top 30% of the worldwide sales.
(4) Custom Consolidations (e.g. Financial Consolidations, Market Segments, Ad Hoc Groups)
Show me an abbreviated income statement by quarter for the last two quarters for my Western Region operations.
Knowledge workers analyze data from a number of different business perspectives or dimensions. As used hereinafter, a dimension is any element or hierarchical combination of elements in a data model that can be displayed orthogonally with respect to other combinations of elements in the data model. For example, if a report lists sales by week, promotion, store, and department, then the report would be a slice of data taken from a four-dimensional data model.
Target marketing and market segmentation applications involve extracting highly qualified result sets from large volumes of data. For example, a direct marketing organization might want to generate a targeted mailing list based on dozens of characteristics, including purchase frequency, purchase recency, size of the last purchase, past buying trends, customer location, age of customer, and gender of customer. These applications rapidly increase the dimensionality requirements for analysis.
The number of dimensions in OLAP systems range from a few orthogonal dimensions to hundreds of orthogonal dimensions. Orthogonal dimensions in an exemplary OLAP application might include Geography, Time, and Products.
Atomic data refers to the lowest level of data granularity required for effective decision making. In the case of a retail merchandising manager, xe2x80x9catomic dataxe2x80x9d may refer to information by store, by day, and by item. For a banker, atomic data may be information by account by transaction by branch. Most organizations implementing OLAP systems find themselves needing systems that can scale to tens, hundreds, and even thousands of gigabytes of atomic information.
As OLAP systems become more pervasive and are used by the majority of the enterprise, more data over longer time frames will be included in the data store (i.e. data warehouse), and the size of the database will increase by at least an order of magnitude. Thus, OLAP systems need to be able to scale from present to near-future volumes of data.
In general, OLAP systems need to (1) support the complex analysis requirements of decision-makers, (2) analyze the data from a number of different perspectives (i.e. business dimensions), and (3) support complex analyses against large input (atomic-level) data sets from a Data Warehouse maintained by the organization using a relational database management system (RDBMS).
Vendors of OLAP systems classify OLAP Systems as either Relational OLAP (ROLAP) or Multidimensional OLAP (MOLAP) based on the underlying architecture thereof Thus, there are two basic architectures for On-Line Analytical Processing systems: The ROLAP Architecture, and the MOLAP architecture.
The Relational OLAP (ROLAP) system accesses data stored in a Data Warehouse to provide OLAP analyses. The premise of ROLAP is that OLAP capabilities are best provided directly against the relational database, i.e. the Data Warehouse. An overview of the ROLAP architecture is provided in FIG. 1A.
The ROLAP architecture was invented to enable direct access of data from Data Warehouses, and therefore support optimization techniques to meet batch window requirements and provide fast response times. Typically, these optimization techniques typically include application-level table partitioning, pre-aggregate inferencing, denormalization support, and the joining of multiple fact tables.
As shown in FIG. 1A, a typical prior art ROLAP system has a three-tier or layer client/server architecture. The xe2x80x9cdatabase layerxe2x80x9d utilizes relational databases for data storage, access, and retrieval processes. The xe2x80x9capplication logic layerxe2x80x9d is the ROLAP engine which executes the multidimensional reports from multiple users. The ROLAP engine integrates with a variety of xe2x80x9cpresentation layers,xe2x80x9d through which users perform OLAP analyses.
As shown in FIG. 1A, after the data model for the data warehouse is defined, data from on-line transaction-processing (OLTP) systems is loaded into the relational database management system (RDBMS). If required by the data model, database routines are run to pre-aggregate the data within the RDBMS. Indices are then created to optimize query access times. End users submit multidimensional analyses to the ROLAP engine, which then dynamically transform the requests into SQL execution plans. The SQL execution plans are submitted to the relational database for processing, the relational query results are cross-tabulated, and a multidimensional result data set is returned to the end user. ROLAP is a fully dynamic architecture capable of utilizing precalculated results when they are available, or dynamically generating results from atomic information when necessary.
Multidimensional OLAP (MOLAP) systems utilize a proprietary multidimensional database (MDB) to provide OLAP analyses. The main premise of this architecture is that data must be stored multidimensionally to be accessed and viewed multi-dimensionally.
As shown in FIG. 1B, a typical prior art MOLAP system has a two-tier or layer client/server architecture. In this architecture, the MDB serves as both the database layer and the application logic layer. In the database layer, the MDB system is responsible for all data storage, access, and retrieval processes. In the application logic layer, the MDB is responsible for the execution of all OLAP requests. The presentation layer integrates with the application logic layer and provides an interface, through which the end users view and request OLAP analyses on their client machines which may be web-enabled through the infrastructure of the Internet. The client/server architecture of a MOLAP system allows multiple users to access the same multidimensional database (MDB).
As shown in FIG. 2A, information (i.e. basic data) from a variety of operational systems within an enterprise, comprising the Data Warehouse, is loaded into a prior art multidimensional database (MDB) through a series of batch routines. The Express(trademark) server by the Oracle Corporation is exemplary of a popular server can be used to carry out the data loading process in prior art MOLAP systems. As shown in FIG. 2B an exemplary 3-D MDB is schematically depicted, showing geography, time and products as the xe2x80x9cdimensionsxe2x80x9d of the database. The multidimensional data of the MDB is organized in an array structure, as shown in FIG. 2C. Physically, the Express(trademark) server stores data in pages (or records) of an information file. Pages contain 512, or 2048, or 4096 bytes of data, depending on the platform and release of the Express(trademark) server. In order to look up the physical record address from the database file recorded on a disk or other mass storage device, the Express(trademark) server generates a data structure referred to as a Page Allocation Table (PAT). As shown in FIG. 2D, the PAT tells the Express server the physical record number that contains the page of data. Typically, the PAT is organized in pages. The simplest way to access a data element in the MDB is by calculating the xe2x80x9coffsetxe2x80x9d using the additions and multiplications expressed by a simple formula:
Offset=Months+Product*(#of_Months)+City*(#of Months*#of_Products)
During an OLAP session, the response time of a multidimensional query on a prior art MDB depends on how many cells in the MDB have to be added on the fly. As the number of dimensions in the MDB increases linearly, the number of the cells in the MDB increases exponentially. However, it is known that the majority of multidimensional queries deal with summarized high level data. Thus, as shown in FIGS. 3A and 3B, once the atomic data (i.e. basic data) has been loaded into the MDB, the general approach is to perform a series of calculations in batch in order to aggregate (i.e. pre-aggregate) the data elements along the orthogonal dimensions of the MDB and fill the array structures thereof.
For example, revenue figures for all retail stores in a particular state (i.e. New York) would be added together to fill the state level cells in the MDB. After the array structure in the database has been filled, integer-based indices are created and hashing algorithms are used to improve query access times. Pre-aggregation of dimension DO is always performed along the cross-section of the MDB along the DO dimension.
As shown in FIG. 3C3, the primarily loaded data in the MDB is organized at its lowest dimensional hierarchy. As shown in FIGS. 3C1 and 3C3, the results of the pre-aggregations are stored in the neighboring parts of the MDB. As shown in FIG. 3C2, along the TIME dimension, weeks are the aggregation results of days, months are the aggregation results of weeks, and quarters are the aggregation results of months. While not shown in the figures, along the GEOGRAPHY dimension, states are the aggregation results of cities, countries are the aggregation results of states, and continents are the aggregation results of countries. By pre-aggregating (i.e. consolidating or compiling) all logical subtotals and totals along all dimensions of the MDB, it is possible to carry out realtime MOLAP operations using a multidimensional database (MDB) containing both basic (i.e. atomic) and pre-aggregated data.
Once this compilation process has been completed, the MDB is ready for use. Users request OLAP reports by submitting queries through the OLAP Application interface (e.g. using web-enabled client machines), and the application logic layer responds to the submitted queries by retrieving the stored data from the MDB for display on the client machine. Each data retrieval operation carried out on the MDB involves searching through the Page Allocation Tables (e.g. search trees) maintained therefor in order to determine the addresses of the data elements needed to answer the query. Because the Page Allocation Tables (PATs) typically contain billions of entries, paging of the tables from mass storage memory is often required as schematically depicted in FIG. 4. This increases the time required to search the Page Allocation Tables, find the n-dimensional Cartesian addresses for the sought after data elements, convert the n-dimensional Cartesian addresses into physical record addresses, and physically access the corresponding data records stored within the storage volumes of the MDB.
Thus, each time the basic or atomic data in the MDB requires updating in any significant manner, for any reason, the MOLAP system must carry out computationally intensive data compilation operations in order to precompile (i.e. pre-aggregate) data within the MDB. The graphs plotted in FIG. 5 clearly indicate the computational demands that are created when searching an MDB during an OLAP session, where answers to queries are presented to the MOLAP system, and answers thereto are solicited often under realtime constraints. However, prior art MOLAP systems have limited capabilities to dynamically create data aggregations or to calculate business metrics that have not been precalculated and stored in the MDB. Thus, there is a great need in the art for an improved way of and means for accessing data elements within a multi-dimensional database (MDB) containing precompiled or pre-aggregated data and supported on a parallel computing platform during OLAP or like operations, while avoiding the shortcomings and drawbacks of prior art systems and methodologies.
In view of the computational demands of such prior art MOLAP systems, Applicants have recognized that the performance of such systems might be significantly improved, and thus made more competitive with and superior to prior art ROLAP systems, if parallel processing techniques are used to implement prior art MOLAP processes.
In FIG. 6, Applicants disclose a novel type of parallel computing machine (i.e. platform) 1 for implementing MOLAP systems. As shown therein, the multi-dimensional database (MDB) 2 is supported on the parallel machine using a plurality of processors 3 denoted P0, P1, Ppxe2x88x921, each having DRAM 4 for address data storage during system operation, and one or more storage volumes 5 for storing application data and address data. An OLAP server 6 (e.g. the Express(trademark) Server from the Oracle Corporation) is provided between the Data Warehouse (e.g. RDBMS) 7 and the parallel machine 2. The processor(s) 8 within the OLAP server 6, denoted by P(s), and DRAM 9 and local storage volumes 10 associated therewith, are in communication with the array of processors 3 in the parallel computing machine 2. Also, as shown, each processor 3 in the parallel computing machine 2 has direct access to the mass storage volumes within the Data Warehouse 7. For illustration purposes, the processor(s) used in the Data Warehouse 7 are indicated by reference numeral 11, whereas its DRAM is indicated by reference numeral 12, and its mass storage volumes are indicated by reference numeral 13.
In principal, the use of parallel processing machines as taught by Applicants in FIG. 6 should enable quick and direct access to an array of answers to the submitted queries, as well as speed up the pre-aggregation process and the execution of multidimensional queries and drill-down processes. Also, effective parallel processing can be expected only by ensuring that the data is evenly distributed data among the processors in the parallel computing system, and that all loads are balanced.
In an effort to apply parallel processing techniques to prior art MOLAP systems, Applicants have developed two novel methods of data element address assignment (i.e. address data translation), each based on partitioning the array of multidimensional data. The first method seeks to partition a conventional array of data by dividing it by the lowest dimension of the corresponding MDB, as schematically illustrated in FIG. 7A. The second method seeks to partition a multidimensional data by dividing it by the highest dimension of the corresponding MDB, as schematically illustrated in 7B.
As indicated in FIG. 7C, the first method of data element address assignment attempts to carry out data address assignment using a method of partitioning a multidimensional data by dividing it by the lowest dimension of the corresponding MDB. As illustrated in FIG. 7A, this method results in unbalanced data processing among the processors of the parallel computing machine, and in sequential, as opposed to parallel, access to data.
As indicated in FIG. 7C, the second method of data element assignment attempts to carry out data address assignment using a method of partitioning a multidimensional data by dividing according the highest dimension of the corresponding MDB. As illustrated in FIG. 7B, this method results in unbalanced data processing among the processors of the parallel computing machine, and in sequential access to data.
Surprisingly, Applicants have discovered that implementing a MOLAP system on a parallel computing platform, using the data structure of conventional Page Allocation Tables, does not provide increases in system performance (e.g. decreased access/search time) which might be expected when parallelizing a serial computing application.
Accordingly, it is a further object of the present invention to provide an improved method of and apparatus for accessing data elements within a multidimensional database (MDB) using a parallel computing platform, achieving a significant increase in system performance (e.g. deceased access/search time) using parallel computing techniques.
Another object of the present invention is to provide such apparatus in the form of an improved MOLAP system, wherein the MDB contains precompiled or pre-aggregated data and parallel data loading operations are carried out between the Data Warehouse and the MDB of the system using a novel modular arithmetic based data element address assignment scheme which involves mapping (i) integer-encoded MDB dimensions associated with the raw data elements accessed from the Data Warehouse, into (ii) integer-encoded data storage addresses within the storage volumes associated with the MDB.
Another object of the present invention is to provide such apparatus in the form of an improved MOLAP system, wherein parallel data aggregation operations are carried out within the MDB of the system using a novel modular arithmetic based data element address assignment scheme which involves mapping (i) integer-encoded MDB dimensions associated with the raw or previously pre-aggregated data elements to be stored within the MDB, into (ii) integer-encoded data storage addresses within the storage volumes thereof at which the pre-aggregated data elements are to be stored.
Another object of the present invention is to provide such apparatus in the form of an improved MOLAP system, wherein OLAP operations are carried out within the MDB of the system using a novel modular arithmetic based data element address assignment scheme which involves mapping (i) integer-encoded MDB dimensions associated with preaggregated data elements to be accessed from the MDB, into (ii) integer-encoded data storage addresses within the storage volumes thereof, from which the pre-aggregated data elements are to be accessed.
Another object of the present invention is to provide such an improved MOLAP system, wherein data processing tasks are evenly distributed among processors on the parallel computing platform of the system.
Another object of the present invention is to provide such an improved MOLAP system, wherein data elements within the MDB of the system are evenly distributed among the processors on the parallel computing platform thereof.
Another object of the present invention is to provide such an improved MOLAP system, wherein each processor on the parallel computing platform handles data elements assigned thereto during data address assignment operations carried out during parallel data loading operations and parallel data aggregation operations within the system.
Another object of the present invention is to provide such an improved MOLAP system, wherein there is no need to exchange data among processors on the parallel computing platform.
Another object of the present invention is to provide such an improved MOLAP system, wherein the need for interprocessor communication among the parallel processors is minimized.
Another object of the present invention is to provide an improved MOLAP method, wherein parallel data loading operations are carried out between the Data Warehouse and MDB of the system using a data element address assignment scheme that employs mapping of MDB dimensions using modular arithmetic.
Another object of the present invention is to provide such an improved MOLAP method, wherein parallel data aggregation operations are carried out between the Data Warehouse and MDB of the system using a data element address assignment scheme that employs mapping of MDB dimensions using modular arithmetic.
Another object of the present invention is to provide such an improved MOLAP method, wherein data processing tasks are evenly distributed among processors on the parallel computing platform of the system.
Another object of the present invention is to provide such an improved MOLAP method, wherein data elements within the MDB of the system are evenly distributed among the processors on the parallel computing platform thereof.
Another object of the present invention is to provide such an improved MOLAP method, wherein each processor on the parallel computing platform handles data elements assigned thereto during data address assignment operations carried out during parallel data loading operations and parallel data aggregation operations within the system.
Another object of the present invention is to provide such an improved MOLAP method, wherein there is no need to exchange data among processors on the parallel computing platform.
Another object of the present invention is to provide such an improved MOLAP method, wherein the need for interprocessor communication among the parallel processors is minimized.
Another object of the present invention is to provide a new method of generating an information directory or index for a multidimensional database (MDB) used in a MOLAP system. Another object of the present invention is to provide such a method of generating an information directory or index for an MDB, wherein data element addresses to data storage elements therewithin are generated using (i) modular arithmetic functions, (ii) dimensions of the MDB and its dimensional hierarchy, and (iii) data variables from the relational database management system (RDBMS) of the Data Warehouse associated with the MDB.
Another object of the present invention is to provide an improved decision support system which allows knowledge workers to intuitively, quickly, and flexibly manipulate operational data using familiar business terms in order to provide analytical insight into a business domain of interest.
Another object of the present invention is to provide a novel method of using a MDB to support OLAP systems.
Another object of the present invention is to provide an improved system and method of searching and updating a MDB containing an index of information resources locators (URLs) on the Internet, referred to as an MBD-based URL-Index or Directory.
Another object of the present invention is to provide such an improved system and method of searching and updating a MDB-based URL-Index or Directory, wherein data storage, retrieval, updating and shifting operations are carried out within the MDB of the system using a novel modular arithmetic based data element address assignment scheme which involves mapping (i) integer-encoded MDB dimensions associated with data elements to be stored in, retrieved from or shifted within the MDB, into (ii) integer-encoded data storage addresses within the storage volumes thereof.
Another object of the present invention to provide a novel method of data mapping and storage for use in the parallel access of multidimensional data bases, as well as in parallel data loading and aggregation operations, and on-the-fly multidimensional queries, while ensuring balanced processing and minimizing interprocessor communication among a plurality of processors.
Another object of the present invention is to provide a method of decomposing, or partitioning, an n-dimensional database into p modules, where p represents the number of processors (i.e. processing module) in the multiprocessing array, D0, D1, Dnxe2x88x921 represent n dimensions, and k represents the k-th out of p processing modules, is based on the following address data translation (i.e. mapping) formula:
k=(Dnxe2x88x921+D1+D0) mod p
Another object of the present invention is to provide such as method, wherein each data element is specified by index k, and the entire data domain is decomposed and assigned to the Processor (Memory) Space of p processing modules.
Another object of the present invention is to provide a novel MDB-based Internet URL Directory system for supporting on-line information searching operations by Webenabled client machines.
Another object of the present invention is to provide a novel personalized electronic commerce (i.e. on-line) shopping system, in which consumer shopping profile information is collected on individual consumers during e-commerce and other transactions, stored in an MBD for quick access and use in creating Web-enabled personalized shopping environments (e.g. personalized Web-stores) in a real-time manner which reflect the interests, tastes, desires and/or expectations of the individual customers engaged in online shopping activities supported by electronic-commerce servers over the Internet.
Another object of the present invention is to provide a novel MDB-based system for providing fast, affordable and easy access to customer intelligence, enabling companies to more effectively market products and services over the Internet.
Another object of the present invention is to provide a novel MDB-based system that enables value-added services to customers running e-commerce enabled Web sites.
Another object of the present invention is to provide a novel MDB-based system that enables improved levels of strategic business analysis and data mining on the Internet.
Another object of the present invention is to provide a novel MDB-based system that enables a company to leverage strategic information on its customers and competitors by quickly uncovering hidden patterns and more accurately predicting customer behavior.
Another object of the present invention is to provide a novel MDB-based system that enables fast knowledge discovery and accurate predictive business modeling for applications such as database marketing, financial/risk analysis, fraud management, bioinformatics, return-on-investment (ROI) justification, business intelligence applications (e.g. Balanced Scorecard, Activity-Based Costing), customer relations management (CRM), enterprise information portals and the like.
Another object of the present invention is to provide a novel Internet-enabled MDB-based system for supporting real-time control of processes in response to complex states of information reflected in the MDB.
These and other object of the present invention will become apparent hereinafter and in the Claims to Invention set forth herein.