This invention relates to the field of enterprise data management (EDM) applications, and more particularly to the transformation of operational system databases for use in EDM applications.
Enterprise data management (EDM) is the process of sharing data across an organization and making information available to decision makers, regardless of the source of the data. The field has grown rapidly as businesses have discovered the return on investment possible from making decisions based on enterprise-wide data resources. Increasingly, organizations are analyzing current and historical data to identify useful patterns, and to support business strategies. EDM applications include: E-commerce internet portals, customer relationship management (CRM), enterprise resource planning (ERP), business intelligence (BI), and data warehousing.
Data warehousing is representative of the characteristics of each of these EDM applications, which extract operational data of an enterprise from the enterprise operational data systems and collects it in a multi-dimensional database. The multi-dimensional database supports on-line analytical processing (OLAP) to analyze groups of records that share a common field value, in contrast to on-line transaction processing (OLTP) which accesses individual records in the operational data system relational database. In effect, OLAP extracts business intelligence while OLTP extracts data.
To permit multi-dimensional analysis, the EDM application data must be separated from the operational systems, and the required operational system data must be transferred into the EDM application. All of the transferred operational source data must be logically transformed from the operational system model to the logical and physical structure of the data warehouse, which aligns with the business structure. As an example, a data warehouse may combine data from different source applications such as sales, marketing, finance, and production, thereby giving it the ability to cross-reference data from these applications. This transfer and transformation of the data between system models is referred to as xe2x80x9cdata migrationxe2x80x9d.
Assuming that the data model (i.e., the logic and structure) of the particular EDM application is known, the major step in data migration is the transformation of the source data, which is necessary to make it independent of the relational data model of the operational systems as well as have it fit into the EDM data model. Attributes of the source data that are essential to the operational system may be unnecessary for the particular EDM application, and are not loaded into the warehouse. This step typically involves xe2x80x9cdata cleansingxe2x80x9d or xe2x80x9cdata stagingxe2x80x9d processes which are labor intensive and tedious in a data warehousing project. This transformation involves the major steps of reverse engineering the source data, its form, and its database structures to determine its metadata, adapting the source data to fit the data structure of the EDM application, and loading the transformed source data to the EDM target database.
In varying degrees, this requires the EDM application developer to: (i) analyze the source databases to identify the type and form of the source data and the structure and operational system of the source databases, to determine their relevance to the EDM target database; (ii) load the source data from the legacy databases to an intermediate database using extract, transform, and load (ETL) software tools or hand-coded programs; (iii) massage, cleanse, and manipulate (xe2x80x9ctransformxe2x80x9d) the source data in the intermediate database into the form needed for the EDM application; and (iv) load the transformed source data into the EDM application, and format it in those schemas necessary for OLAP or other EDM applications.
In the prior art, these steps are performed manually, by a team of developers, including those of ordinary skill in the art of software programming. While there are now ETL software tools that automate the process of extracting and transforming data from the source database to the EDM application, these ETLTools recognize and rely on the source metadata (the source data structure) available from the source database management system (DBMS) to generate the EDM application code. However, the program developer can only determine the source metadata through the source data DBMS documentation (if available), interviews with the customer (if they have the knowledge), and a lengthy and laborious trial and error process.
The above cross-referenced patent application discloses and claims method and apparatus for automating the analysis of a data source. It does this by establishing a data repository having an entity structure which defines the metadata characteristics of a generic model data source. The data repository may be configured to reflect the construct of any given data source. Once configured, the source database may then be automatically analyzed in an elemental sequence defined by the repository""s entity structure to provide a comprehensive metadata model of the source data. This set of metadata may then be used directly to generate program code for a plurality of different EDM applications; each with different database models and operating systems.
One object of the present invention is to provide method and apparatus for acquiring the metadata of a source database in a manner which allows the acquired metadata to be used to generate diverse type EDM applications. A further object of the present invention is to create an architecture for a data repository which defines the metadata characteristics of a generic model data source in a manner which permits program code for EDM applications to be generated directly from the acquired metadata. A still further object of the present invention is to provide method and apparatus for generating such EDM program code for different database models and for different EDM operating systems.
According to the present invention, a method of obtaining the metadata of a data source comprises the steps of creating a data repository having an entity structure which defines the metadata characteristics of a generic model data source, accessing the data source to determine its construct, configuring the data repository entities to reflect the construct of the data source, and analyzing the data source in response to the configured data repository entities to obtain the source metadata. According to another aspect of the present invention, the step of analyzing the data source to determine its metadata includes the steps of obtaining those elements of source data which correspond to the metadata attributes of the configured data repository entities, and recording the obtained elements of source data in the data repository, each in association with their corresponding metadata attribute. In further accord with this aspect of the invention, the step of analyzing further includes the steps of inferring selected aspects of the data source structure on the basis of the recorded elements of source data, and recording those inferred data source aspects in the data repository for review by an operator, who may either accept or modify one or all of the inferences.
The present invention automates the process of building an EDM system application, and, in its best mode, the method is embodied as a software application. It employs an exhaustive reverse engineering process to analyze the source data and acquire a comprehensive set of source metadata. The invention then uses this information to build a model of the data. As part of the analysis process, the invention provides recommendations for the target EDM application database, such as primary keys, foreign keys, table splits, normalization, dimensions, measures, and views of the data. The result is an optimal target database for the EDM application.
The present invention allows users to migrate multiple disparate systems by providing a complete understanding of the metadata and generating the ETL programs to merge the data. Since the code is automatically generated, the speed of implementation is dramatically increased. Because the metadata is based on the actual source data, it is of the highest degree of accuracy.
These and other objects, features, and advantages of the present invention will become more apparent in light of the following detailed description of a best mode embodiment thereof, as illustrated in the accompanying Drawing.