1. Field of the Invention
This invention relates in general to data management systems performed by computers, and in particular, to the processing and generation of queries, named table variables and other relations in connection with database systems that support the SQL standard.
2. Description of Related Art
Graphical query tools are used to simplify the process of creating and executing queries against a database. The tools are often end-user oriented and in many cases provide the user with a simplified graphical view of a query by modeling parts of, or restricting, the xe2x80x9cselectxe2x80x9d statement used in SQL implementations. U.S. Pat. No. 6,247,008 by Cambot et al discloses methods for allowing users to work with business words, dynamically manipulate objects and to use non-technical terminology when querying a relational database. U.S. Pat. No. 6,160,549 by Touna et al describes methods for allowing users to specify data and computations to be performed on the data using a graphical layout metaphor. U.S. Pat. No. 5,812,840 by Shwartz describes methods for interactively creating valid queries by using an expert system.
Multidimensional data processing or the OLAP category of software tools is used to identify tools that provide users with a multidimensional conceptual view of data, operations on dimensions, aggregation, intuitive data manipulation and reporting. The term OLAP (Online analytic processing) was coined by Codd et al in 1993. The paper (Codd, E. F. et al, xe2x80x9cProviding OLAP to User-Analysts: An IT Mandatexe2x80x9d, E.F. Codd Associates, 1993) by Codd et al also defines the OLAP category further. An overview of OLAP and other data warehousing technologies and terms is contained in the text by Singh (Singh, H. S., xe2x80x9cData Warehousing. Concepts, Technologies, Implementations, and Managementxe2x80x9d, Prentice Hall PTR, 1998). The text by Ramakrishnan et al (Raghu Ramakrishnan/Johannes Gehrke, xe2x80x9cDatabase Management Systemsxe2x80x9d Second Edition, McGraw-Hill, 1999) describes basic multidimensional- and relational database techniques, many of which are referred to herein.
OLAP systems are sometimes implemented by moving data into specialized databases, which are optimized for providing OLAP functionality. In many cases, the receiving data storage is multidimensional in design. Another approach is to directly query data in relational databases in order to facilitate OLAP. The patents (U.S. Pat. No. 5,926,818 and U.S. Pat. No. 5,905,985) by Malloy et al describe techniques for combining the two approaches. The relational model is described in a paper (Codd, E. F. xe2x80x9cA Relational Model of Data for Large Shared Data Banksxe2x80x9d Communications of the ACM, 13(6): 377-387, 1970) by Codd from 1970.
OLAP systems are used to define multidimensional data cubes, each with several dimensions, i.e., hypercubes, and should support operations on the hypercubes. The operations include for example: slicing, grouping of values, drill-down, roll-up and the viewing of different hyperplanes or even projections in the cube. The research report by Agrawal et al (Agrawal et al, xe2x80x9cModeling Multidimensional Databasesxe2x80x9d, Research Report, IBM Almaden Research Center) describes algebraic operations useful in a hypercube based data model for multidimensional databases. Aggregate-type operations are described in the patents by Agrawal et al (U.S. Pat. Nos. 5,832,475; 5,890,151; 5,799,300 and 5,926,820) and by Gray et al (U.S. Pat. No. 5,822,751). In particular, U.S. Pat. No. 5,799,300 by Agrawal et al describes efficient methods for performing range-sum queries on a d-dimensional data cube.
An object of the current invention is to introduce methods for building and transforming SQL based queries, named table variables, such as table views, and other relations including xe2x80x9cvirtualxe2x80x9d or calculated relations. The methods include defining and associating with each relation one or more dependency matrixes that are used to hide or display details about the relation upon request. The dependency matrices trace the structure of a relation to its source data and thereby make the buildup of complex queries transparent to the end-user. This is especially important in data warehousing where the buildup of fact tables in star or other schemas needs to be transparent in order for the fact tables to be reusable in future schemas. The methods are used to implement a more transparent and yet powerful relational algebra on top of the standard SQL relational algebra used in most SQL databases. The methods may be implemented through a simple graphical user interface, also disclosed.
Measurements from various institutions and research entities are by nature heterogeneous. Synthesizing measurements into longer strings of information is a complex process requiring nonstandard operations. This is especially true when dealing with measurements lacking the accountant type structure of business related data. As for example, health related information about individuals, genotype readings, genealogy records and environmental readings. The shortcomings of current OLAP tools in dealing with these types of non-associative measurements is evident, for example, by realizing the emphasis placed on aggregation operators such as max, min, average and sum in current tools and research. Most often, these operators are rendered useless by the lack of a quantifying domain such as xe2x80x9cmoneyxe2x80x9d. On the other hand, when carefully synthesized and analyzed, these and other similar sets of measurements do contain valuable knowledge that may be brought to light using multidimensional analysis.
In order to overcome some of the limitations in the prior art, the present invention discloses methods and embodiments supporting multidimensional analysis in data management systems.
It is moreover an object of the present invention to enable online tuning of relations in multidimensional analysis. According to the invention, relations are modified by a depth-of-field operator, through a graphical user interface, that can be applied to any collection of dimensions and relations supported by the dimensions. In effect, the online depth-of-field operator varies the density of points or facts in a representation of a multidimensional cube. It allows one to experiment online with the definition of relations, thereby controlling the output of the synthesizing process.
It is also an object of the present invention to facilitate online definitions of multidimensional cubes fit for being populated with data from various measurements and other cubes or relations. According to the invention an axes matrix is used to specify axes structures related to each dimension or domain. An operator, called blowup operator herein, possibly associated with the axes matrix is described. These techniques create a connection between measurements and domains, and a user defined multidimensional view containing knowledge that is acquired through complex multidimensional processing.
It is another object of the present invention to implement a syntheses process for multidimensional analysis. The process dynamically eliminates ambiguities, observed in combined measurements used to populate a hypercube or a new relation. This is achieved by introducing additional relations reflecting dependencies between dimensions in the hypercube or equivalently between columns in a relation and by confirming combined measurements/relations against selected realistic observations. These additional calculated or virtual relations are used in the definition of a new relation in the same way that a regular database table (relation) would herein. One of the advantages of calculated relations over tables and views in database systems is that the calculated relations may be reused independent of all table relations. Another advantage of calculated relations is that it allows real life observations or theory to be modeled by formulas and thereby filling in gaps in the data. This prevents the gaps from extending to the new relations being constructed.
It is yet another object of the present invention to implement a system that enables OLAP for a wider variety of data and structures than current relational implementation schemas, such as the star or snowflake schema and related techniques. In some cases, this is done by forcing the structures into current schemas, but in other cases, new and more dynamic schemas are introduced. Among the structures is a grouping operator for multidimensional analysis, applicable, among other things, to measurements about domains with variable level of granularity. The operator does not force the measurements into using the same level of granularity or hierarchy and it is generic with respect to any domain and hierarchical structure.
The main processes introduced are reversible and therefore well behaved with respect to adding, updating or deleting measurements, i.e., rows, from the original system of relations. Thus, the processes, when combined, define a continuously updatable/editable OLAP system for heterogeneous relations. The heterogeneous relations and dimension structures may include, but are in no way limited to, relations containing health data for individuals (e.g., biomarkers), ecological data, genotype readings (e.g., location of markers in individuals), genealogical records, geographical data and so on.