Business data is typically stored within physical tables of a database. The database may comprise a relational database, such as Oracle, Microsoft SQL Server, IBM DB2, Teradata and the like. Alternatively, the database could be a multi-dimensional database, an eXtendable Markup Language document, or any other structured data storage system.
The structures and relationships of the physical database tables are complex. A typical end user is therefore unable to locate or extract desired information from the physical database tables. Business Intelligence tools (e.g., BusinessObjects Universe Designer®) may be used to build an abstraction layer that shields end users from the complexity of the physical tables. More specifically, the abstraction layer allows the end users to query a database using intuitive terms rather than references to specific physical entities of the database.
U.S. Pat. No. 5,555,403 describes such an abstraction layer, referred to therein as a semantic layer. Briefly, a semantic layer defines a set of “business objects” that represent business entities, such as customers, time, financial figures, etc. Business objects may be classified as dimensions (along which one may want to perform an analysis or report), details (e.g., additional information on dimensions), and measures (e.g., indicators, most often numeric, whose value can be determined for a given combination of dimension values).
The creation of a semantic layer corresponding to physical database tables typically consists of two main steps. First, a logical database schema is created to describe the structure of the physical tables. The logical database schema may be populated with a subset of the physical database tables, and is often augmented with additional entities, such as logical views and joins. Logical views include logical tables defined by Structured Query Language (SQL) expressions referencing the physical tables. Joins relate logical tables and are expressed using SQL predicates relating these tables.
Next, business objects of the semantic layer are created and mapped to the logical schema. The mapping describes how to compute a given business object using the logical entities exposed by the logical schema. For instance, a business object “Customer” could be mapped to an ID field in a Customer table entity, through an expression such as “Customer.ID”. Consequently, values of the ID field in the Customer table must be accessed in order to access values of the business object “Customer”.
Using the definition of a logical database schema in terms of the underlying physical schema, and the mapping of the business objects in terms of entities in the logical database schema, it is possible to receive symbolic queries that only reference business objects, and to generate physical queries against the physical database based on the symbolic queries.
FIG. 1 is a block diagram illustrating the foregoing scenario. Database 110, which may comprise any structured data storage, includes physical tables 115. Logical database schema 120 includes entities associated with some or all of physical tables 115, as well as additional entities, such as logical views and joins. Abstraction layer 130 includes business objects mapped to entities of logical database schema 120.
Consumer 140, which may comprise a reporting tool or any other system requiring access to the data of physical tables 110, views and interacts with the business objects of abstraction layer 130. Consumer 140 may transmit a symbolic query including one or more of the business objects to abstraction layer 130. Query generator 150 then generates a database query based on the symbolic query and the mapping between logical database schema 120 and abstraction layer 130.
Conventional systems require significant processing in order to eliminate possible ambiguities in the symbolic queries created by end users. For example, FIG. 2 illustrates one example of database schema 200 including a table associated with several many-to-one relationships. Accordingly, database schema 200 presents a conventional “fan trap” problem.
Schema 200 describes a database which stores information about customers who spend their holidays in resorts. Both the customers and the resorts are located in cities. Information about cities is centralized in dimension table 210, which itself has a many-to-one relationship to dimension table 220 of countries.
Conventional Business Intelligence systems do not support Business Intelligence queries using a schema such as schema 200. For example, a business object name may be mapped to the “City” table 210, and a user may use the business object to request “sales per city”. The query would be considered ambiguous, because the query would not specify whether the user is requesting the amount of sales per city of customers, or the amount of sales per city of resorts.
To address the foregoing, a designer of a conventional system creates a logical alias of each logical table that is a “fan trap”, and a logical alias of all tables to which the fan trap relates. Then, different user-friendly names will be associated to each of the aliased tables. For example, in view of schema 200, a designer using Business Objects' “Universe Designer” tool may create logical database schema 300 of FIG. 3, which includes logical aliases (310, 315) of City table 210 and logical aliases (320, 325) of Country table 220.
An abstraction layer can then simply associate a business object with each of the aliased tables, allowing a user to query, for instance, “sales by country of customer” without any ambiguity. Creation of a “logical alias” does not require creation of corresponding physical tables or duplicate data. In the case of logical database schema 300, only one physical table remains for each of the “City” and “Country” entities of the database schema, and each of the aliased tables is a logical view of one of these tables. The relationships from one aliased table to the other are properties of the alias, not of the physical table itself.
However, the creation and maintenance of aliases entails significant costs. For each entity in a database schema having several many-to-one relationships converging to it, the logical database schema must include aliases of not only this entity but also all the entities to which it has a foreign key. In the abstraction layer, business objects must also be created for each of the aliases. Human supervision of aliases is necessary, because the need for an alias sometimes cannot be deduced automatically by an authoring environment.
Moreover, if the physical database schema changes, aliases of the logical database schema need to be updated. For example, if a database designer adds a new “Region” table between City table 210 and Country table 220 to database schema 200, one distinct alias of the new table must be added between entities 310 and 320 of logical schema 300 and another alias of the new table must be added between entities 315 and 325 of logical schema 300.
Conventional aliasing in a logical database schema also limits the expressiveness of queries in arbitrary ways. For example, physical schema 400 of FIG. 4 describes a database which captures information about goods being shipped from one country to another. It may be desirable to compute the difference between the value of goods exported from a country and the value of goods imported by the country, but the conventional static aliasing mechanism makes such computation difficult.
FIG. 5 illustrates logical database schema 500 corresponding to physical schema 400 according to conventional systems. Logical database schema 500 includes two aliases, and a user may use the corresponding business objects to determine a matrix summing sales from each country of origin to each country of destination. However, since the two aliases are mapped to different business objects that have nothing in common, no efficient method exists for representing the fact that a given country can be both a country of origin and a country of destination.
Yet another issue arises if different measures from different fact tables can be used to relate the same dimensions. For example, a Reservations fact table may be added to schema 200. Like the Sales fact table, the Reservations fact table may include data related to customers and resorts. If a query only specifies dimensions such as customer and resorts, but no measure, then an ambiguity exists as to which of the fact tables should be used to relate the dimensions. More particularly, the user may want to know which customers are actually staying in which resort, or which customers are booking stays in which resort. In order to solve such ambiguities, some systems associate a context—a list of authorized joins—to a fact table, and ask the user to specify which context should be used. Other systems, in which all dimensions are considered orthogonal, may simply use a default measure.
What is needed is an efficient system to create and fulfill unambiguous database queries. Such a system preferably avoids the cost of maintaining complex static aliases or contexts, and provides greater expressive power than current systems.