(1) Field of the Invention
This invention relates to a method for performing metric data analysis, and more particularly to an object oriented based methodology for selecting and defining both measures and dimensions for metric analysis implemented in a web based computing environment.
(2) Description of the Related Art
Data mining and database warehousing allow users to analyze large databases to solve business decision problems. Data mining is, in some ways, an extension of statistics, with the addition of artificial intelligence and machine learning.
Data warehousing involves the retrieval from an operational relational database of a subset of a business' data and the subsequent storage of the data in a staging relational database. Typically, the retrieval and storage of such data occurs as the result of the execution of a scheduled background process. This scheduled background process, or batch process, is scheduled to run at a time when large retrievals from the operational database will cause the least interference with business activities.
The subset of data chosen for retrieval is selected based on business analysis criteria. Specifically, data is retrieved for analysis purposes which represent meaningful measures of business activity or from which may be derived such measures. Examples might include the row entries of all suppliers of goods for which orders were filled and the number of orders filled by each. Additional information might include a numeric region code from which each supplier satisfied the provision of goods for a particular order.
Once extracted, the mined data could allow a user to query in a number of useful manners. For example the user might query this data to spot trends in the number of orders filled by individual suppliers. Alternatively, the user might wish to determine the distribution of orders fulfilled by region regardless of the identity of the supplier. As yet another alternative, the user may wish to derive information from data not specifically stored in the database. For example, the user may wish to determine the amount of time a typical customer's order requires for processing. If the time of order initiation and completion is recorded, the duration of the order process may be determined.
It is possible to perform data mining queries directly on the operational database. Such queries may be performed through any number of relational database interfaces which permit SQL queries such as Enterprise Manager by Microsoft Corporation of Redmond. However, this methodology suffers from several drawbacks. One drawback is the potential for a data bottleneck to form around the interface between any data analysis tool and the operational relational database. As it is the primary responsibility of the operational database to support real-time, business critical data support, a large number of queries generated for analysis purposes imposes a potentially crippling data access overload. An additional downside to issuing analysis queries directly against the operational database is the less than optimal structure of the database for purposes of analysis. Because the operational database is typically designed to support all the business needs of an enterprise, it likely contains a large volume of data against which data miners possess no need to perform metric analysis. In addition, because the data needed to perform data mining analysis is usually only a small portion of the data contained in the operational database, the data of interest could be much more quickly analyzed if it were separated from non-crucial, in terms of the analysis of interest, data of the operational database. This fact arises from the incremental increase in resources required to search larger databases as opposed to smaller ones.
In order to avoid bottlenecks and to increase the speed at which queries may be performed against metric data, there is commonly employed the process of data warehousing. As noted, typical data warehousing involves the execution of a batch process to extract data from the operational database and store the data in a staging relational database. In one presently known embodiment, the process includes a series of structured query language (SQL) statements. When these SQL statements are executed, desired data is retrieved from the operational database and stored on a staging database.
In addition to data warehousing, software to aid in the analysis of the staging database may be utilized. Analytical reporting features are often provided through the use of OLAP (On-Line Analytical Processing) technologies. OLAP engines and reporting tools provide a multi-dimensional view of data and are optimized for fast aggregation. OLAP tools support commonly used methods of analysis such as drill down on summary data, pivoting and rotating the data in spreadsheets, and filtering data on one or more dimensions. Such functions are broadly referred to as data mining. Reports generated from data in an OLAP format can be more interactive than those generated from relational database tables. Examples of OLAP technologies include OLAP Services from Microsoft of Redmond, Wash. The OLAP engine functions as a buffer between the staging relational database and any analysis tool capable of accessing and displaying the output of the OLAP engine. Examples of such analysis tools include Impromptu from Cognos of Ottawa, Canada.
While the general use of a batch process for populating a staging database through which an OLAP interface provides metric analysis is well known, the present art suffers from three drawbacks which serve to diminish the utility of such a configuration. First, there is no explicit integration between the business model of an organization and the data warehouse and data mining functions. A business may employ a variety of third party and proprietary software components to carry out its business functions. Many of these components will write to and retrieve information from one or more operational databases. As there is no unifying relationship between these components, it requires a great deal of labor and resources to construct updated batch processes capable of retrieving and storing desired metric data.
Second, the identification of data inside the operational database which should optimally be transferred to the staging database is often not integrated with the process of system design and implementation. Third party software is routinely configured to perform portions of a business's processes. There is no formal connection between different software components and the internal data objects which form the source code for each component are typically not accessible to users of the software. As a result, once a system is configured, the identification of data objects which require metric analysis is laborious and painstaking. In addition, there is no opportunity while configuring or developing the operational system to identify attributes or processes for later analysis.
Lastly, as an architected or integrated computer based system for carrying out business processes is changed to incorporate evolving business practices, maintenance of the batch process becomes increasingly difficult. Specifically, over time, new applications are created and implemented by business users. These new applications will most probably create, access, and edit new data entries in the operational database. Many of these new data entries will have previously undefined, complex relationships with other data entries and will require metric analysis. As a result, human operator intervention is required to re-code, test, and implement updated batch processing software to extract data from the operational database and update the staging relational database.
Therefore, there exists a need for an integrated method of defining business models in which a high level business model is explicitly tied to the definitions of the attributes and processes requiring metric analysis. In addition, there is required a method by which these attributes and processes may be flagged during the development phase of the operational system as requiring metric analysis. It would be of further utility if, in addition to individual attributes and processes, various other related attributes and processes could be easily identified and flagged as well. Lastly, there is needed an automated system for generating the executable code comprising the batch process. Ideally, such code could be generated after any change to the operational system and would reflect the metric analysis needs arising from such changes.