1. Field of the Invention
present invention is directed to the field of relational database browsing and information mining. It is more particularly directed to computer-implemented discovery of metadata, and generation of multidimensional models in a relational database.
2. Description of the Background Art
A computer-implemented database is a collection of data, organized in the form of tables. A table typically consists of columns that represent data of the same nature, and records that represent specific instances of data associated with the table. A relational database is a database that may be a set of tables containing information that is manipulated in accordance with the relational model associated with the data. For example, the product marketed under the trademarks IBM DB2 stores the data associated with the database in tables, and each table has a name. It will be appreciated that other vendors also provide relational databases.
On-Line Analytical Processing (OLAP) is a computing technique for summarizing, consolidating, viewing, analyzing, applying formulae to, and synthesizing data according to multiple dimensions. OLAP software enables users, such as analysts, managers, and executives, to gain insight into performance of an enterprise through rapid access to a wide variety of data dimensions that are organized to reflect the multidimensional nature of the enterprise performance data, typically be means of hypotheses about possible trends in the data. More particularly, OLAP may be used to analyze corporate data from different viewpoints by identifying interesting associations in the information in a relational database.
Data mining is a technique employing computer-based techniques to enable users to query structured data stored in computers in forms such as, multidimensional relational databases, conventional relational databases, or flat computer files. More particularly, data mining involves extracting computer-based information and enables a user to discover trends about the computer-based information. An increasingly popular data model for OLAP applications is the multidimensional database (MDDB). MDDBs are often used by a data analyst for interactive exploration of data, such as performance data, by techniques such as data mining.
Metadata is information that describes the characteristics of stored data. For instance, data in a relational database may be described by metadata such as the name of associated relational database tables and columns. More particularly, each relational database typically has a set of tables, such as system catalog tables, which are automatically maintained by the computer system and contain information about the tables and other objects that are stored in the relational database. Information about the relational database can be retrieved from the system catalog tables using structured query language (SQL) queries.
SQL is a standardized language for defining and manipulating data in a relational database and may be used during data mining. A query may be an expression whose result is a table, and may be embodied in software structures such as a query statement or a query object. A query searches the records stored in specified tables to find the answer to a question. A query is a request for information from the relational database based on specific conditions such as, which subset of the data should be retrieved and how the data is to be presented. For example, a request for a list of all departments in a DEPARTMENT table whose budget is greater than $10,000 is an example of a query. Further, the SQL query may require analysis of the metadata associated with a relational database.
A browser may be considered a text extender function that enables a user to display text on a computer monitor. Browsing is typically used to examine records in a file, such as a relational database. By means of example, a browser may operate on one computer, such as a client computer and initiate requests to a second computer, such as a server computer so that information from the second computer may be displayed via the first computer. When a user attempts to browse information during OLAP processing, the amount of information may be so large that it is difficult to determine useful information. For example, if a user attempts to browse a relational database and uses SQL queries it may be difficult to discover OLAP trends, such as OLAP cube models, that could be used to facilitate OLAP analysis.
The product marketed under the trademarks IBM DB2 Query Management Facility (QMF) is a multipurpose query program for reporting, data sharing, server resource protection, powerful application development, and native connectivity to DB2 platforms. QMF provides an interface to build queries and business reports by accessing DB2 information, such as information provided in a DB2 catalog. QMF may operate with a browser.
The creation of MDDBs typically requires a large volume of metadata objects that are used to generate OLAP cube multidimensional models when OLAP queries are initiated. A multidimensional model may be a set of rules or a formula for predicting the most-likely data-structure outcome based on existing data. An OLAP cube multidimensional model typically comprises a set of tables that represent facts and dimensions associated with a database, providing an optimized structured presentation of metadata associated with a relational database and thereby enabling efficient mining. In the past, creation of the OLAP multidimensional cube models via the creation of metadata objects required manual, user intervention. This required a user to be extremely knowledgeable of OLAP structures such as metadata. A QMF Query Object is an example of such a metadata object and is typically used to generate a software query, such as an SQL query. Further, user-created metadata objects may be incorrect, having errors such as non-conforming or malformed structures with respect to a particular relational database structure. In the past and to overcome the problems inherent in the use of possibly incorrect user-created metadata objects, the underlying relational database tables were examined during creation of an OLAP cube multidimensional model. This in turn, required that referential integrity constraints were defined for the underlying relational database tables. Referential integrity constraints may operate to ensure that one-to-many and many-to-many relationships, between multidimensional metadata and relational database structures, are enforced within the operation of a relational database schema. Executing such referential integrity constraints requires considerable computer resources.
To overcome such problems of the past, it would be advantageous to automatically discover metadata objects during query mining and query analysis. Further, it would be useful for such automatically discovered metadata objects to conform to the relational database structure.