The process of migrating data from a source (e.g., a database) to a target (e.g., another database, a data store, a data mart or a data warehouse) is sometimes referred to as Extract, Transform and Load, or the acronym ETL.
Existing ETL techniques focus on the creation of ETL jobs that are designed for a specific database configuration that is represented by the ETL repository as a single data store configuration. ETL jobs are special sets of instructions, or workflows, which define a unit of work that can be independently scheduled for execution. Defining workflows is an efficient way to maintain and reuse data extraction, transformation, and loading scripts, but this reuse has historically been limited to the context of a single data store configuration for each data source.
In some existing ETL systems, it is possible to provide simple modifications to profile information regarding a database logon ID/password, using aliases or other methods, but a systematic approach of providing detailed metadata regarding multiple data store profiles is not available. This limited approach to data store profile metadata means that when an ETL job is created, there is very limited flexibility in updating the job and its dependent data flows and workflows if there is a change in the data store configuration. For example, an upgrade from Oracle® 8 to Oracle® 9i requires that a computer programmer review all of the existing code associated with a job to make sure that it conforms to any modifications in SQL syntax and to confirm that it is updated to use new version information in all connection strings. This process is time consuming and prone to error if the developer fails to identify all of the potential dependencies between the job and various workflows and data flows.
Often within ETL systems it is desirable to create a job that can work in multiple environments. Consider the following cases in which ETL job portability is crucial: when working with multiple instances of a data source, when migrating between development/test/production environments, when an OEM partner designs jobs for one database type and deploys the jobs to other database types, and when working within multi-user development environments. In each of these cases there is a need to update an ETL job by changing the associated data store configuration. In the example of migration between development, test, and production environments, it is typical that each of the environments has its own versions of the data source (or set of data sources) and the target data source (or set of target data sources). Likewise, in a multi-user environment where developers and testers are working with different local or network accessible versions of source and target data sources, managing the creation and updating of an ETL job such that the job can easily be ported between these data sources is desirable. In these cases, job portability is desirable to accommodate modifications that are required to move an ETL job between different environments and versions of a data source.
It would be desirable to improve upon existing ETL techniques that configure a data store. In particular, it would be desirable to provide a mechanism for creating multiple configuration profiles for a data store and associating ETL jobs with the configuration profiles.