The modem world can be characterized in terms of ever-increasing complexity. In the business world in particular, most companies face intense competition, and constantly changing consumer preferences. These companies must deal with the impact of new laws and regulations in some areas while experiencing uncertainty from the effects of deregulation in other areas. A business can expect competition not only from other local businesses but also from national and international companies as well. The broad scope of worldwide competition further increases the number of variables to be considered. In an effort to deal more effectively with this volatile environment, many businesses have turned to computer modeling to more accurately calculate the effect of changing variables on business performance.
Computer modeling allows great quantities of data to be utilized in the construction of data models. A data model is a collection of conceptual tools for describing real-world processes to be modeled based on data from the database, and the relationships among these processes. There are several variations of the database organization methods that are used in the construction of data models. A relational database uses a method in which files are linked together as required. In a non-relational system, such as hierarchical or network type, records in one file contain embedded pointers to the locations of records in another. These are fixed links which are set up in advance to speed up processing. In a relational database, relationships between files are created by comparing data, such as accounts and names. A relational system is more flexible, and can take two or more files and generate a new file that meets matching criteria, such as the names of customers who bought a certain product.
In order to use relational databases, a Structured Query Language (SQL) was developed to interrogate and process data. SQL can be used as a fill-blown programming language, or SQL commands can be embedded within a programming language to interface with a database.
The DataBase Management System (DBMS) is the software that manages the database. It allows for creation of tables and the processing done in relation to these tables. Most all DBMSs today are actually Relational DBMSs (RDBMS), which manage the relationships between tables. The DBMS creates tables by defining individual elements of data that fill them. These data fields are called "attributes" in a relational database. When data is entered, updated, queried, or when reports are generated, the application communicates with the DBMS in SQL.
A Decision Support System (DSS) is an information and planning system that allows a user to analyze information and predict the impact of decisions before they are implemented. A DSS is an integrated set of programs that share data. These programs might also retrieve from external sources industry data that can be compared and used for statistical purposes.
Large companies may have millions of customers and thousands of products. When a trend in the marketing of a certain product or product line is to be tracked, multiple years worth of data may be required, resulting in a huge amount of data to be manipulated. Managers and executives may want views of several aspects of the data, such as summaries by region, product type and year. This kind of search is not easily handled by a relational database.
In response to business needs of this sort, a database architecture called the On Line Analytical Processing database (OLAP) has been developed. This data model is conceptualized as a multidimensional cube or block composed of smaller "cells". A simple three-dimensional model can be visualized as a cube in which data resides at the intersections of elements which comprise the dimensions. A simple example would have divisions of a company "A", "B", and "C" along an "X" axis dimension, time in years, such as "1996", "1997", and "1998" along a "Y" axis dimension, and a financial dimension including the "Sales", "Units", and "Profits" along a "Z" axis. The individual components of the dimensions are called "elements", and in this example include A, B and C from the "company" dimension, Sales, Units and Profits from the "financial" dimension, etc. Once this model is constructed, a user can specify data to be extracted from a particular cell of the model, for example the projected profits expected from B company in the year 1998. The contents of the cells may be accessed from database records for known quantities or they may be calculated by applying known business or mathematical principles to data from the database.
Within the larger classification of OLAP there are two main variations, MOLAP and ROLAP.
Multidimensional OLAP (MOLAP) allows the user to load data into a proprietary format database and precalculate all the cells in a cube. Users can then enter queries against that cube. Since the entire cube is calculated in advance, response to queries can be very fast. However, loading and precalculating the entire cube can take many hours or days, and changes to the data or model are not easily implemented. MOLAP may result in the unnecessary expenditure of effort and expense, since all cells are precomputed and stored, regardless of whether anyone ever requires their data. Because of the necessity to construct each cell, MOLAP is limited to relatively small datasets, and generally MOLAPs are poorly integrated with underlying relational databases. MOLAPs also have the disadvantage of being complex to administer. Additionally, the MOLAP cubes cannot share data, or metadata (data that describes the included data) with other cubes. Introduction of hypothetical information to produce "what-if" type analysis is allowed on only a limited basis.
MOLAP is especially used in budgeting and planning applications. Examples of MOLAP software are Oracle Express.TM. and Arbor Essbase.TM..
Relational OLAP (ROLAP) allows the data to stay in a data warehouse until a user submits a query. The ROLAP server receives the query, and translates it into SQL queries that are sent to the data warehouse. The server then receives the data, performs further calculations if required and returns the results to the user. In contrast to MOLAP, cells in the ROLAP cube are calculated as required "on the fly" in a virtual cube, and the cells are discarded after the query is complete.
Traditional ROLAP also has its own disadvantages. Business modeling is generally very limited, and the database typically needs much tuning and administrative attention. Performance may be marginal, and processing of hypothetical information for "what-if" analysis is usually not possible. Also ROLAP products generally support only a small number of interactive users.
ROLAP is especially used in sales and marketing analysis. Examples of ROLAP software are MicroStrategy Metacube.TM. and Information Advantage.TM..
In both types of traditional OLAP, there is a database infrastructure, which is usually developed and maintained by an Information Technology (IT) department. Development of OLAP models usually follows a cycle of model building, analysis based on the model, validation of the analysis and publication of results, which leads to further refinements and additions to the model, and so on. In traditional OLAP, the IT department is usually involved with construction of the models, validation and publication of results, while the end user is involved only with the analysis phase of the cycle. This procedure is something of a necessity for most OLAP programs, since they are largely designed to be implemented by personnel with at least some knowledge of computer programming, such as a company's IT department. This leads to a certain isolation of the end user (who may not be deeply computer literate) from the models which he may be manipulating. The end user may not be aware of certain unshared assumptions built into the model. Additionally, the development cycle may be constantly interrupted, sometimes for weeks or months at a time, because the user must involve the IT department for even minor modifications of the models, and then for publication of the results.
A better type of OLAP would provide intuitive user interfaces, which non-programmers could easily utilize to build and modify models, and to publish results. This would return control of the modeling and publishing portions of the cycle to the same end user who would perform the analysis, while the IT department could be concerned with maintaining the database infrastructure. A user could therefore have an intimate knowledge of how a result is obtained, because he would have built the model himself and could determine the assumptions made in the model construction. Better yet would be a system in which the formulas and rules utilized in calculating a value could be accessed and modified as needed to make the model more accurate, or a system in which hypothetical values or assumptions could be introduced to observe how the outcome would be affected. This kind of OLAP modeling in which control is distributed to the end-users, rather than being centrally controlled by an IT department, shall be called a "Distributed OLAP" or DOLAP for short.
Consequently, there is great need for a computer on-line analytical processing and modeling system that can utilize large databases in a relational manner, and a system that at the same time offers large-scale modeling, requires less administrative attention to maintain, supports many interactive users and allows easy insertion of hypothetical values to produce "what-if" analysis.