The successful operation and use of data warehouses heavily depends on the effective management of a multitude of metadata. In a data warehouse system, there are two types of users: technical users and business users. Technical users such as data warehouse administrators are mainly interested in metadata in the technical implementation aspect, while business users, who are not familiar with technologies such as data structured query language (SQL), are interested in understanding the business meaning of data, and therefore need business-oriented representations of the structure and contents of data in data warehouse. So, the metadata in data warehouse can be divided into two categories based on target users for the metadata, comprising:
Business Metadata: Business metadata intend to provide a business-oriented description of the data and processes. In a data warehouse environment, the important business metadata include: Business Concept Model, i.e. a concept model that is used for organizing business knowledge in a semantic way and that represents how to run businesses via relationships among business concepts, concept attributes and concepts; Multidimensional Model, i.e. a concept model that is used for defining complete requirement for business intelligence (BI) application and that represents how to measure businesses in terms of measures, dimensions and the hierarchies of dimensions.
Technical Metadata: Technical metadata provide the description of data within an IT infrastructure, for instance, locations of data, names of data, method for accessing to servers, data storage types, and other attributes. Some example technical metadata are the schema of data warehouse, the schema of operational data sources.
The schematic mapping from the business metadata to IT metadata is the key to design a business-friendly data warehouse. This mapping can advantageously support the implementation of: business-oriented navigation of the data collected in the data warehouse or the data marts; ad-hoc querying at the level of business concepts without having to know the technical details on query languages (such as SQL); the automatic deployment of data mart based on the on-line analytical processing (OLAP) requirement represented by the business metadata.
In the prior art, a multidimensional model is defined by business users from the perspective of businesses, and a data warehouse schema is developed by one or more groups of technical users, and the mapping from the multidimensional model to the data warehouse (DW) schema (star-schema), i.e. the mapping from the path expression of multidimensional model to the path expression of data warehouse schema has to be created manually. Here, a path expression includes a direct property of the concerning class and an indirect property connecting two classes through a chain of properties.
The creation of mapping from a path expression of multidimensional model to a path expression of data warehouse schema is a very complex, time-consuming and error-prone task, since measures and dimensions in the multidimensional model are always correlated and their semantics is implicit to the data warehouse schema. In addition, the mappings that were created previously are difficult to be reused for similar dimensions. For example, if there are three dimensions, namely policy holder's education level, policy holder's income and insurance participant's income, in a claim analysis of an insurance company, the mappings for a dimension is difficult to be reused for any other dimension because there are no separate mappings for the concepts “policy holder” and “insurance participant” and no separate mappings for the property “income.”
Apparently, the creation of the mapping from multidimensional model to data warehouse schema by means of the existing solutions is very complex and specialists who are familiar with both a business and an IT infrastructure, is required to conduct a large amount of work in order to complete the creation accurately. Business users can hardly create various mappings as their demands in a specific application independently and conveniently and ensure certain accuracy at the same time. As a result, the efficiency of management and utilization of the data warehouse is degraded dramatically.