1. Field of the Invention
This invention relates generally to systems and methods for analyzing and extracting data from Enterprise Resource Planning (ERP) or other, business software systems, and replicating that data over computer networks into other systems, specifically data warehouses and/or data marts and on-line analytical processing (OLAP) systems, for the purpose of effectively analyzing and creating reports using that data.
Specifically, one implementation of the present invention is a system and method for automatically analyzing the form and structure of the data within an ERP system and determining the appropriate metadata and data to be replicated and aggregated from that system to a separate data server, in order to construct a data warehouse and an associated OLAP cube without significant human intervention. The structure of the resulting data warehouse and OLAP cube permits cross ledger analysis and reporting against both summary and detail transactional data from the source ERP system simultaneously.
2. Description of Related Art
Almost any organization of any substantial size uses a commercial ERP (Enterprise Resource Planning) system, in which their day-to-day operational transactions are collected, stored, processed and managed. ERP systems are multi-module software applications which allow management of the important functions within an organization, including parts purchasing, maintaining inventories, interacting with suppliers, providing customer service, and tracking orders, among others. Almost all such systems incorporate a series of commonly-used financial accounting ledgers, i.e. a General Ledger (GL) and subledgers (for example, accounts receivable, accounts payable, inventory, fixed assets, and so on). A ledger is a classification and summarization of financial transactions, and includes the general ledger, legally required for audits. The term ‘ledger’ used herein includes subledgers. ERP systems often (although not always) use a relational database system to store their data.
In general, an ERP system is mainly a transactional system, i.e. one that is designed to perform transactions; for example, when a purchase order is received from a customer. Transactional systems are structured and organized so that these types of transactions can be performed as rapidly and reliably as possible with the large volumes of data that they require. However, the essential structure and organization of transaction systems make them poorly suited for analyzing the data they hold and reporting against it in certain ways, such as are typically required to effectively and efficiently manage a business. In addition, ERP systems are “mission-critical”, in that they are vital to running the organization's business and so their performance cannot be impacted by also using them for other tasks (such as analysis and reporting). To circumvent these issues, many organizations separate their analysis/reporting systems from their transaction systems by replicating key transaction data from their ERP systems (and possibly also other databases) and storing them in a data warehouse or data mart.
A data warehouse typically contains a summary snapshot of an organization's transactional data at a single point in time, and this data is organized in such a way as to make it well suited for management analysis and decision making. This snapshot of data in the data warehouse is created by manual or automatic replication of a subset of the data in one or more external transaction systems (such as an ERP system). In addition, the data is often summarized across certain dimensions (operational facets) of the data, such as time, product, customer, location, etc., to provide summary information in a form which more accurately reflects the structure of the organization than the data in an ERP system.
Data marts are essentially structurally identical to data warehouses, except they are usually smaller and only contain data relevant to a narrow segment or specific functional area of the organization. Both data warehouses and data marts typically use a relational database system for data storage. Data marts are most useful for “stovepipe” analysis (i.e. for a specific subject area not linked to other business functions), and are ideal in such situations because they are faster and easier to create than a data warehouse; however as a result they are not as well suited for cross-functional analysis and reporting.
Despite their advantages stated above, data warehouses are still not ideally suitable for easily creating queries to answer the kind of real business questions used in management decision making (often called business intelligence). This is because a significant requirement of the end users asking such business questions is the ability to create their own ad-hoc analytical queries and reports using common business terms; however data warehouses are built on relational databases and so utilize very technical concepts such as tables, fields and views. In addition, data warehouses usually require the use of a complex query language, e.g. SQL (Structured Query Language) to access the data. None of these concepts are at all intuitive to non-technical end users.
So for example, a business may sell a wide variety of products in different regions over some period of time. In a simple transactional system, sales data might be stored as individual data records, each representing a sale to a customer on a particular date (time period), and linked to records in other tables containing the customer's address (geographic location), the details of the products sold (product), and other information. In a data warehouse the same data would be stored in a large central fact table, containing a series of records containing the aggregated value of all sales to each specific customer for each specific product in each specific region over each time period (for example individual months), together with summary totals for each grouping of customer, product, region, and time period. Each record in the fact table also contains a set of key values, each of which references a separate dimension table containing the descriptive business terms associated with the key values (e.g. the product names or geographic regions). These descriptive business terms are sometimes referred to as members or member names. The fields in the fact table which contain numerical values to be aggregated are referred to as measures. Measures are the central values of a cube that can be analyzed by the dimensions and hold the figures that the end user is primarily interested in. Some common measures are sales amount, cost amount, sales quantity. The structure of a data mart with a central fact table joined to separate dimension tables is referred to as a star or snowflake schema.
In the above example, joining the fact tables to one or more dimension tables to get a meaningful result is a non-trivial process, and as a result is difficult for the average end user, as well as hard to perform programmatically in a generalized way. Performing even more complex joins may even result in a query which takes an inordinately long time to complete and so significantly degrades system performance. Further, the process of aggregating data from potentially millions of transaction records derive sub-totals and totals for every combination of product, regions and customer, etc is difficult and inefficient in a relational database. Consequently, an additional type of data storage system is often created which represents the warehouse data in the form of the business terms rather than the underlying database format. Such systems are called on-line analytical processing (OLAP) systems.
In an OLAP system, each dimension is represented in the OLAP system's data structures (often referred to as a cube), together with the elements of that dimension. In addition, each dimension may contain a series of hierarchical levels; for example, in a time dimension that might be years, quarters, and months. All these dimensions, member names and hierarchies are collectively known as “metadata”, and are typically stored within the OLAP system so that they can be referenced much more intuitively for business analysis and reporting purposes by non-technical users.
OLAP systems are therefore commonly used to facilitate the efficient retrieval and analysis of the information in data warehouses and data marts. These OLAP systems permit the rapid and effective creation of ad-hoc management analyses and complex summary reports from a number of different perspectives, which would be impossible, inefficient and/or too cumbersome with transactional systems.
A side benefit of OLAP systems is that they make it impossible for a naïve end user to accidentally (or maliciously) create a “runaway query”, i.e. one which generates a complex, multi-table join that takes an enormous amount of time and system resources to complete (degrading the performance of the entire system).
Because data warehouses and data marts (and sometimes OLAP systems) typically use relational databases as their storage mechanisms, the principal way in which their data is usually accessed is by using SQL (Structured Query Language). SQL is a sophisticated and complex query language, which can be used to define the tables and fields of a database; create, modify or delete the data records; or extract data via queries. In addition, some relational databases permit the creation of stored procedures (essentially predefined SQL programs which can be saved in the database). Just as with procedure calls in any other programming language, stored procedures can be passed arguments and return values, so they can be used to perform complex programmatic tasks.
Those familiar with the theory and practice of this art will recognize that constructing such data warehouses and OLAP systems is a complex and difficult process, requiring much experience and skill; both with the technical issues in using a relational database, but also with the business knowledge to selecting the appropriate business terms, hierarchies and aggregations to facilitate the kind of reporting required. Automating such a process is often complex and error-prone, but can potentially yield significant organizational benefits by increased efficiency, reduced cost and eliminating human error.
However, there are also some limitations to the kinds of analysis that can be performed using traditional data warehouses and OLAP systems. These include:
Because of size, space and complexity constraints within the database, it is often very difficult to construct a data warehouse which combines all the data which may be required from separate ERP modules into a single data warehouse and OLAP cube. The usual solution is to use separate data marts and/or cubes; however, this approach makes certain kinds of cross-functional analysis and reporting very difficult to perform. For example, with separate data marts and/or cubes it would be very difficult to produce a report which enables users to drill from GL sales numbers to a breakdown of those sales by customer and then further by location and/or by item within that customer. The user may also want to drill to customer orders not yet invoiced or fulfilled and then to review the customer's outstanding account balance. Supporting this kind of drillthrough requires access to summary and detail data from a range of ERP modules, including the GL, Accounts Receivable, order entry and inventory control.Data warehouses and OLAP cubes typically contain only summary and aggregate data, making certain kinds of detail level analysis and reporting impossible. To circumvent this, many systems include the ability to drillthrough from the summary data values into the individual transaction records (or some subset of them) which make up those values in order to facilitate detailed level analysis; however, such functionality is usually very inefficient (requiring linking across different systems) and is very hard to provide in a generalized and useful way across all the data in the warehouse without making the data set unmanageably large.To avoid displaying incorrect data values, many data warehouses only reference data within the ERP ledgers that has already been posted to the General Ledger (GL), making it difficult, for example, to track information about unshipped orders or unpaid invoices. Again, this would require access to the detailed transactional data.When updating a data warehouse or an OLAP cube, typically a significant amount of human intervention is required to make decisions about the placement and structure in the dimension hierarchies of new or changed metadata items in the ERP system.
There is therefore a clear requirement for an invention that will help to address the above limitations within existing data warehouses and OLAP systems within an organization in an efficient and practical way.