The subject matter described herein relates to the combination of multi-dimensional data sources using database operations. Within relational databases, an important tool for linking information is a join. A join links the contents of two or more database tables (referred to hereinafter simply as “tables”). The result of the linking is displayed in the form of another table.
The combining of tables is described using one or more ON conditions within a select statement. An ON condition describes a condition between two tables, whereby one field of each table has to be contained in this condition. Equal join conditions are usually used. They have the format: Field1(Table1)=Field2(Table2). Database systems typically permit any ON conditions. A result set is generated when a join is processed. The following instances are based on the assumption that all fields of all tables are contained in the result set. This is possible without restrictions, so long as the process of obtaining the result set is of most interest.
A table is referred to as Ti, a record of this table is referred to as tiεTi. A record which contains initial values only is referred to as ε. Linking two tables with a join is referred to with T1 join T2 or T1 left outer join T2. The result set is referred to as TE. A record in the result set tEεTE has the structure tE=(t1, . . . , tn). Each subrecord tiεTi is a record of table Ti that is contained in the join. The fields of a record in a table Ti are called field(ti). Within system query language (SQL) statements, tables are for example called T1 and their fields are referred to as T1.FIELD.
First the result set is determined on the basis of the ON conditions. The result set is then restricted on the basis of a WHERE condition. The result set fulfills the following conditions: every record contains, for every table affected, an element whose structure corresponds to the structure of the table; all ON conditions within a record are fulfilled; all WHERE conditions within a record are fulfilled; and there are no combinations of data records in the database which fulfill the ON and WHERE conditions but which are not contained in the result set.
The following method describes the first step in calculating the result set for two tables T1 and T2, where only the ON conditions are taken into consideration.
Method 1: T1 join T2
1. Examine each record t1εT1 in accordance with the second step.
2. Compare each record t2εT2 with record t1. If the ON conditions are fulfilled then insert a record tE=(t1, t2) into the result set.
This method only determines the way in which the result set is generated. Most database systems use other, more effective methods. If more than two tables are involved, method 1 can be used, but must be applied in several steps. The sequence of the tables is arbitrary. With regard to method 1, each record t1εT1 or t2εT2 can appear as a subrecord in several records of the result set, and a record t1εT1 or t2εT2 may not appear as a subrecord in the result set at all.
In many evaluations, the second case is not desirable. Users often want a record from one of the tables in the result set, even if no suitable record exists in the corresponding table. Therefore, in the SQL standard, so-called outerjoins are defined. In this document, only left outer joins are discussed. With a left outer join, method 1 has to be enhanced slightly as follows.
Method 2: T1 left outer join T2
1. Examine each record t1εT1 in accordance with the second step
2. Compare each record t2εT2 with record t1. If the ON conditions are fulfilled, insert a record tE=(t1, t2) into the result set. If at least one data record is inserted into the result set in this way, return to step 1. Otherwise go to step 3.
3. Insert a record tE=(t1, ε) into the result set where the values of subrecord t2 are initial.
This method can also be used if more than two tables are involved. It then has to be applied in several steps. In this case the sequence of the tables is no longer arbitrary.
The join methods described above are best suited for flat structures, such as database tables. In the context of a data warehouse, however, multidimensional data structures (InfoCubes) are usually used for evaluations, where key figures are characterized by a multiplicity of characteristics that are arranged in different dimensions. Additionally, data structures of this type often need to be combined with other multidimensional or one-dimensional data structures using a type of join. Therefore, joins for multidimensional data sources must have effective access paths.
An Infoprovider represents a view of data that can be used to define queries. At runtime, the query instructs the InfoProvider to supply data. The query defines the data that is needed and the selection criteria that are used.
InfoSets are a type of infoProvider. InfoSets allow several data sources to be linked using a join. The linking of the data sources is defined during definition of the Infoset. Later queries executing the joins are run.
Because the definition of joins (as described above) always refers to tables, only data sources can be linked that possess a flat (tabular) structure; only data sources can be linked that are represented in the database.