1. Field of the Invention
This invention relates in general to data management systems performed by computers, and in particular, to the processing of heterogeneous relations in systems that support multidimensional data processing.
2. Description of Related Art
Multidimensional data processing or the OLAP category of software tools is used to identify tools that provide users with multidimensional conceptual view of data, operations on dimensions, aggregation, intuitive data manipulation and reporting. The term OLAP (Online analytic processing) was coined by Codd et al in 1993 (see Codd, E.F. et al., xe2x80x9cProviding OLAP to User-Analysts: An IT Mandatexe2x80x9d, E.F. Codd Associates, 1993). The paper by Codd et al also defines the OLAP category further. An overview of OLAP and other data warehousing technologies and terms is contained in Singh, H.S.,xe2x80x9cData Warehousing. Concepts, Technologies, Implementations, and Management,xe2x80x9d Prentice Hall PTR, 1998. The text by Ramakrishnan et al. in Ramakrishnan, R. and Gehrke, J., xe2x80x9cDatabase Management Systemsxe2x80x9d, second edition, McGraw-Hill, 1999, describes basic multidimensionalxe2x80x94and relational database techniques, many of which are referred to herein.
OLAP systems are sometimes implemented by moving data into specialized databases, which are optimized for providing OLAP functionality. In many cases, the receiving data storage is multidimensional in design. Another approach is to directly query data in relational databases in order to facilitate OLAP. U.S. Pat. No. 5,926,818 by Malloy and U.S. Pat. No. 5,905,985 by Malloy et al describe techniques for combining the two approaches. The relational model is described in Codd, E.F., xe2x80x9cA Relational Model of Data for Large Shared Data Banks, Communications of the ACM 13(6):377-387 (1970).
OLAP systems are used to define multidimensional cubes, each with several dimensions, i.e., hypercubes, and should support operations on the hypercubes. The operations include for example: slicing, grouping of values, drill-down, roll-up and the viewing of different hyperplanes or even projections in the cube. The research report by Agrawal et al (Agrawal et al., xe2x80x9cModeling Multidimensional Databases,xe2x80x9d Research Report, IBM Almaden Research Center) describes algebraic operations useful in a hypercube based data model for multidimensional databases. Aggregate-type operations are described in several U.S. patents by Agrawal et al. and Gray et al. (i.e., U.S. Pat. Nos. 5,832,475; 5,890,151; 5,799,300; 5,926,820 by Agrawal et al. and 5,822,751 by Gray et al.
Measurements from various institutions and research entities are by nature heterogeneous. Synthesizing measurements into longer strings of information is a complex process requiring nonstandard operations. This is especially true when dealing with measurements lacking the accountant type structure of business related data. As, for example, health related information about individuals, genotype readings, genealogy records and environmental readings. The shortcomings of current OLAP tools in dealing with these types of non-associative measurements is evident, for example, by realizing the emphasis placed on aggregation operators such as max, min, average and sum in current tools and research. Most often, these operators are rendered useless by the lack of a quantifying domain such as xe2x80x9cmoneyxe2x80x9d. On the other hand, when carefully synthesized and analyzed, these and other similar sets of measurements do contain valuable knowledge that may be brought to light using multidimensional analysis.
In order to overcome some of the limitation in the prior art, the present invention discloses methods and embodiments supporting multidimensional analysis in data management systems.
An object of the present invention is to enable online tuning of relations in multidimensional analysis. According to the invention, relations are modified by a depth-of-field operator that can be applied to any collection of dimensions and relations supported by the dimensions. In effect, the online depth-of-field operator varies the density of points or facts in a representation of a multidimensional cube. It allows one to experiment online with the definition of relations, thereby controlling the output of the synthesizing process.
It is also an object of the present invention to facilitate online definitions of multidimensional cubes fit for being populated with data from various measurements and other cubes. According to the invention an axes matrix is used to specify axes structures related to each dimension or domain. An operator, called blowup operator herein, possibly associated with the axes matrix is implemented. These techniques create a connection between measurements and domains, and a user defined multidimensional view containing knowledge that is acquired through complex multidimensional processing.
It is another object of the present invention to implement a syntheses process for multidimensional analysis. The process dynamically eliminates ambiguities, observed in combined measurements used to populate a hypercube. This is achieved by introducing additional relations reflecting dependencies between dimensions in the hypercube and by confirming combined measurements against selected realistic observations.
It is yet another object of the present invention to implement a system that enables OLAP for a wider variety of data and structures than current relational implementation schemas, such as the star or snowflake schema and related techniques. In some cases, this is done by forcing the structures into current schemas, but in other cases, new and more dynamic schemas are introduced. Among the structures is a grouping operator for multidimensional analysis, applicable, among other things, to measurements about domains with variable level of granularity. The operator does not force the measurements into using the same level of granularity or hierarchy and it is generic with respect to any domain and hierarchical structure.
The main processes introduced are reversible and therefore may be made to be well-behaved with respect to adding, updating or deleting measurements from the original system of relations. Thus, the processes, when combined, define a continuously updateable/editable OLAP system for heterogeneous relations. The heterogeneous relations and dimension structures may include, but are by no way limited to, measurements relating to health data for individuals (e.g., biomarkers), ecological data, genotype readings (e.g., location of markers in individuals), genealogical records, geographical data and so on.