The field of business applications of computer technology has seen many important changes over the last few years. With steadily growing computational power and data storage capacities of computer systems used for business data processing, the interest of the business community has shifted from transactional data management systems known as on-line transaction processing (OLTP) systems, mostly supporting day-to-day business operations, and from relatively simple business data processing systems, towards sophisticated business management systems, such as online analytical processing (OLAP) and business intelligence (BI) systems, integrating at the enterprise level many facets and functions of the business, including planning, manufacturing, sales and marketing. Collectively, these are often termed decision support systems.
Among various alternative approaches to business data management and analysis developed over the last few years, many are related to data warehousing. A data warehouse can be defined broadly as a subject-oriented collection of business data identified with a particular period of time, as opposed to a transactional database which is dedicated to ongoing business activities. Data in a warehouse is normally gathered from a variety of sources (mostly various OLTP and legacy systems) and merged into a coherent whole. Data in a warehouse is usually stable, with data being added to the warehouse periodically but usually not removed. However, in some cases the oldest data may be unloaded when the newer data is added.
As opposed to data stored in OLTP systems intended to support day-to-day operations and optimized for the speed and reliability of transaction updating, data stored in a data warehouse is intended to provide higher-level, aggregated views of the data in support of business decision making. This might include total sales by product line or region over a selectable period of time. To provide consistently fast responses to such aggregating queries, data in a data warehouse must be structured in a manner facilitating its synthesis, analysis, and consolidation.
The most characteristic feature of warehoused business data is its multidimensional view of a business - so-called because the business data is organized according to major aspects and measures of the business called dimensions. The dimensions for a business may include its products, markets, profits, or time periods involved. This is in contrast to a data dependencies model of the business data used to keep track of all logical relationships among all the possible data elements relevant to the business and its day-to-day operations.
A dimension may include several hierarchical levels of categories. For example, the market dimension may contain, in descending order, such categories as country, region, state, and city, each category having its own number of specific instances. A hierarchical dimension reduces the total number of dimensions necessary to describe and organize the data, as compared with the situation where each category is represented by a separate dimension.
The action of viewing data in greater detail by moving down the hierarchy of categories, i.e., by moving from parent to child category, is sometimes referred to as “drilling down” through the dam Quite naturally, the action of moving in the opposite direction, i.e., up the hierarchy of categories, to produce a more consolidated, higher-level view of data, is known as “drilling up” through the data. Another aspect of this ‘drilling’ is the ability to ‘drill through’ from one form of data to another where the data are collected in reports having different meta-data models and/or data sources, but where the actual data are compatible (or able to be convened to a compatible form) for use in an application. Drill-through is sometimes defined as the action of navigating from one report to another report and applying the context of the source to the target. Drilling is further discussed below.
The vast amounts of warehoused or otherwise collected business data would be useless without software tools for its analysis. Such tools are known under the collective name of business intelligence (BI) applications, an example of which is a suite of BI applications from Cognos Inc. (Canada). BI applications provide, among other features, data warehouse construction tools, as well as database querying, navigation and exploration tools, and also include reporting, modeling, and visualization tools. Some of these tools combine techniques of data analysis and presentation will) methods for discovering hidden patterns and previously unrecognized relationships among data, an approach sometimes known as data mining.
Among BI applications previously made available to the business community, tools for visualizing business data are of particular importance. By interacting with the user graphically or pictorially, these tools have become the means of choice for consolidating and presenting vast amounts of complex information, in a manner which facilitates pattern recognition in the presented data and invites data exploration.
For example when invoking a drill-though operation from a row of the source report to a target report, a filter is constructed based on the source report row and the filter is then applied to the target report. This filter is then used by an application in requesting data from another application.
Using earlier technology it is necessary to apply some very strict rules in order to effectively drill from one application to another. The drill-through filter generated from the drill-through source is required to match up with the column names used in the target report. This column name matching is typically done through a set of intermediate files called query definitions (QD). A QD file is essentially the SQL query that can be used by a transformation program to generate a new data cube. The QD file contains the column name mapping used to generate the context filter that the target application can understand. In general, a context filter helps restrict the target report data based on the context (or some content) from the source report, For example, a context filter might specify “color=red” because that was the value contained in a cell selected in the source report. Then the target report will only show the sales data for appliances that are red. The process is duplicated for each source/target application pair, although in each case the intermediate files may have slightly different formats, but they still served as a column name map.
For convenience, the following definitions are given for terms that are used throughout this application:
Business Intelligence Tools: Software that enables business users to see and use large amounts of complex data. The following three types of tools are referred to as Business Intelligence Tools: Multidimensional Analysis Software—also known as Online Analytical Processing (OLAP)—Software that gives the user the opportunity to look at the data from a variety of different dimensions. Query Tools—Software that allows the user to ask questions about patterns or details in the data. Data Mining Tools—Software that automatically (and sometimes interactively) searches for significant patterns or correlations in the data.
Cube (also known as Multidimensional Cube): The fundamental structure for data in a multidimensional (OLAP) system. A cube contains dimensions, hierarchies, levels, and measures. Each individual point in a cube is referred to as a cell.
Data-Based Knowledge: Knowledge derived from data through the use of Business Intelligence Tools and the process of Data Warehousing that give us a new kind of knowledge based on data. Data-based knowledge can have several advantages compared to experience/intuition-based knowledge: Improved accuracy—because it is based on so many detailed facts. More current—because the data warehousing and business intelligence tools can so quickly analyze new data. More comprehensive—because so many different perspectives are available through the rapid recombination of elements from different dimensions and different levels of the data hierarchy. New insights are possible—because there are complex patterns in the data tat can be discovered by data mining that would never be detected by human analysis. Less subjective—because conclusions are tied directly to the physical data.
Dimension: One of the perspectives that can be used to analyze the data in an OLAP cube. When browsing the data in a cube, it is possible to view the data from the perspective of different combinations of dimensions. For a Sales database, the dimensions might include Product_Name, Time_of_Sale, Store_Name, and Promotion_Name. Dimensions contain one or more hierarchies, which have levels for drilling up and drilling down in the cube. When a dimension has just one hierarchy, which is quite common, people often refer to the dimension itself as having levels.
Hierarchy: Organization of data into a logical tree structure. Dimensions can have one or more hierarchies. A Time dimension, for example, might have a Calendar hierarchy and a Fiscal hierarchy. Hierarchies contain levels, which organize data in a logical structure. It is the combination of multidimensional data with a hierarchical view given by Business Intelligence Software that allows users to grapple successfully with large amounts of data If each member in a level has 5 to 10 children that are members at the next lower level, the user has a better chance of understanding the significance of the data. Moving between the levels of a hierarchy is called drilling up and drilling down.
Level; The hierarchies in dimensions have levels which can be used to view data at various levels of detail. A Time dimension might have levels for Year, Quarter, Month, and Day. A Product dimension might have levels for Product Family, Product Category, Product Sub-category, and Product Name. A Customer Geography dimension might have levels for Region, Country, District, State, City, and Neighborhood.
Member: One of the data points for a level of a hierarchy of a dimension. Some of the members of the Month level of the Time dimension are January, February, March, and April.
Measure; A numeric value stored in a fact table and in an OLAP cube. Sales Count, Sales Price, Cost, Discount, and Profit could all be measures in an OLAP cube.
Multidimensional Analysis—also known as On-Line Analytical Processing (OLAP): A process of analysis that involves organizing and summarizing data in a multiple number of dimensions. People can comprehend a far greater amount of information if that information is organized into dimensions and into hierarchies. The wide use of spreadsheets and graphs illustrates the need for people to have their information organized. A spreadsheet is a two-dimensional analysis tool. If a person could comprehend ten individual facts, they might possibly comprehend 100 facts if they were ranged in a spreadsheet. Simplistically, if three or four or five dimensions were displayed, the amount of information that could be comprehended might increase exponentially—to 1000 facts, 10,000 facts, and 100,000 facts respectively. Multidimensional data is also organized hierarchically, allowing users to “drill down” for more detailed information, “drill up” to see a broader, more summarized view, and “slice and dice” to dynamically change the combinations of dimensions that are being viewed.