The system integration landscape is more fluid and dynamic than ever before. Wide acceptance of the internet in day-to-day business has accelerated access to information. This increase in information access efficiency has set a higher standard for decision making and analytical capabilities for both individuals and organizations alike. New specialty fields have arisen to accommodate this need for faster, more accurate analytical and predictive capabilities. One field in particular has evolved into what is widely accepted as Business Intelligence (BI) or Business Performance Management (BPM). The underpinning and enabler of this evolution is a technology called OLAP (on-line analytical processing). These fields have driven a wave of new, more powerful analytical and decision making software products and technologies. These applications have become a key part of modern day business and handle such core business processes as budgeting, forecasting, sales analysis, strategical planning, trend analysis, salary planning, capital planning, and headcount, just to name a few. Businesses utilize these decision and analytical tools to gain competitive advantage, report financials and make long term decisions about the direction of their business.
The data required to satisfy a holistic analytical or predictive process is spread in a wide array of databases, platforms and systems throughout an organization. Systems include ERP (enterprise resource planning), CRM (customer relationship management), trade promotion, distribution, etc. Data warehousing methods attempt to solve this by consolidating the disparate data into a standardized central repository, where other systems (such as BI/BPM reporting tools) can access the data in a logical easy-to-use fashion, insulated from the complexity of the source systems.
The ability to consolidate and integrate data from system to system is based on a logical framework known as ETL: extraction, transformation, and load. Many integration tools and methodologies have been designed within the scope of the ETL framework. These processes optimize and accelerate the throughput of data between source and destination systems, complete with required controls (logging, validation, transformation, aggregation). BI/BPM tools directly utilize ETL to import data into their own data/meta-bases, proprietary or otherwise. Most BI/BPM tools provide some kind of ETL mechanism.
Despite these sophisticated ETL mechanisms, there is still a demand in the BPM marketplace for more-effective data exchange between BPM systems and corporate systems. The source of this demand stems from the fact that analytical processes have been evolving faster than integration processes. The shortcomings of today's current integration tools and techniques lie in the fact that modern ETL concepts were developed around the needs of a standardized data-warehouse-type environment and not a fluid BPM-type environment. Data Warehouse environments provide a highly standardized and generic enterprise-wide framework, designed to accommodate a wide array of data access protocols, business rules, and in many cases exceptionally large data volumes, independent of platform or business scenario. ETL was designed to move data in and out of these types of environments; therefore ETL-related concepts are also very generic, a one-size-fits-all approach.
This provides little in the way of reusable, optimized, scenario-based integration processes. The integration patterns between source and destination are constantly re-engineered between the extraction phase, the transformation phase, and the load phase. Applied scenarios are left up to the individual developer or implementation team. In the extraction phase, for example, the source may be an ERP system, or a CRM system, or a lead tracking system, or some kind of custom data store. The technology platforms, protocols and schemes between these sources can vary tremendously. Even within the ERP realm there is still a wild fluctuation in platform and database schema design among vendors. In the transformation phase, application of transformational techniques may depend on the type of industry, the business rules that industry applies, and the integrity of the inbound data. The load phase shares the same problem as the extraction phase: the destination databases may have radically different schemas, platforms, protocols and/or a unique set of requirements. The combination of variables within each individual phase, let alone the combination of all three, can be dizzying. Due to this high selectivity, historically, this has left little in the way of reusable or standardized processes within applied scenarios. Essentially, custom ETL process and custom patterns must be employed for each scenario, therefore increasing the cost and time of development. This spawns a further impediment innate with all custom software solutions: the ability to effectively transfer knowledge between individuals and organizations. This impediment is due to lack of standardization and is driven by the coding techniques, skills, languages, even style of the individual developer. Each solution has to be relearned. This presents challenges for ongoing support and maintenance of the application. This invention addresses this challenge in the form of reusable Integration Templates.
For BI/BPM to obtain maximum effectiveness within an organization, individualized, highly targeted and specialized integration processes are required. These processes require more-specific reusable techniques and more clearly defined processes to accommodate efficient integration between targeted sources and destinations, more than what a generic ETL framework provides today.
Most packaged BI/BPM software solutions on the marketplace today embrace a multiphase approach. This is primarily for two reasons: (A) there is no single apparatus on the marketplace today that effectively provides the single-point BPM capability, and (B) application integration is not the BI/BPM software vendors' area of expertise, so they have little in the way of resources to build this into their applications effectively.
BPM destination systems can capture data through a variety of methods. These methods may be system feed data via a data load mechanism or end-user input via an application interface. As data is entered into the OLAP destination system, regardless of the method, unique data intersection points can become fragmented. The larger volume of input, the higher degree of fragmentation tends to occur. There are two types of fragmentation that can occur in BPM destination systems: horizontal fragmentation and vertical fragmentation. Vertical fragmentation is defined as the accumulation of intersection differences across database rows. Horizontal fragmentation is defined as accumulation of intersection differences across columns or tables. Data fragmentation, if not addressed, can cause exponential performance degradation of the BPM destination system and use unnecessary system resources.
The schema implementation of BI/BPM/OLAP-based databases is different from the classic relational model applied in RDMS systems or data warehousing scenarios. The classic RDMS model tends to apply a highly normalized state to its data schema. This highly normalized state is optimal for storage, organization, input and management of large amounts of data. While normalization is effective for the functional characteristics of data storage, it tends to present challenges from a business usability perspective. OLAP, on the other hand, implements its data schema in a highly de-normalized fashion (the reverse of the RDMS). De-normalization of data is not designed for functional efficiency, but more around business usability, data accessibility, and data access performance. Many times this data is loaded into cubes for aggregation and hierarchical pre-calculation capabilities. A de-normalized, cube-based schema requires special consideration and unique techniques applied when moving data through the transformation pipeline.
The closest prior art of which the applicant is aware includes U.S. Pat. No. 6,151,608 issued to Abrams and U.S. Pat. No. 6,208,990 issued to Suresh. The templates, patterns and pipelines employed by prior art U.S. Pat. Nos. 6,151,608 and 6,208,990 are designed to load to relational destination systems, while the present invention loads to OLAP- or dimensional-based destinations systems. The techniques employed are fundamentally different than relational destination techniques, and so are the underlying database platforms. U.S. Pat. No. 6,151,608 does not address required OLAP-based functions and is built solely around Oracle database platform. The migration rule templates, patterns and principles are subjected to Oracle database mechanics only, which severely limits its capability on SQL Server or Analysis Service based platforms. U.S. Pat. No. 6,208,990 is optimized for data warehousing environments but again does not address the particulars of a high performance, reusable single-point process designed for BPM systems. Cutting edge newer technologies have evolved away from primary keys to keyless data management. This presents new challenges for ETL not addressed in prior art load mechanisms.
Classic relational load models such as U.S. Pat. No. 6,151,608 rely heavily on the use of primary keys to determine row uniqueness and selectivity. Neither U.S. Pat. Nos. 6,151,608 nor 6,208,990 provide injectable integration templates.