Embodiments disclosed herein provide techniques for detecting reference data tables in Extract, Transform, and Load (ETL) processes.
ETL processes are implemented in programs organized in project folders for a data integration solution. Thus, the installation of data integration software might have one or multiple processes with one or multiple jobs grouped into each of the processes. ETL processes typically integrate data from multiple, heterogeneous data sources into a central repository, such as a data warehouse (DW) or a master data management (MDM) system. Reference data generally defines a set of values that describes other data. Some examples of reference data are: gender, country codes, courtesy titles (Mr., Mrs., Miss, Dr., etc.), units of measure, and so on. Reference data can be found in applications by code tables, lookup tables, properties files, or it may be hard-coded. Consistent reference data is the cornerstone of many information centric applications such as data warehousing, master data management (MDM), as well as in operational business applications such as customer relationship management (CRM) and enterprise resource planning (ERP). Without consistent reference data, many business problems can occur. For example, in DW environments, revenue reports by country and customer type, created using reference data describing these entities, may produce incorrect results due to the inconsistent reference data. In MDM environments, product categorization may produce unexpected results, and customer information cannot be established, without consistent reference data for each type of entity.
Reference Data Management (RDM) systems have emerged to ensure consistency of reference data across applications and between enterprises. RDM systems vary from implementation to implementation, but generally an RDM solution provides a single place for business owners to create, update, review and distribute reference data across an enterprise.
Reference data management solutions are particularly useful in data integration projects. Typically, at any given point in time in medium to large enterprises, there are one or more data integration projects being implemented to, for example, add additional sources to a data warehouse and standardize data from multiple legacy systems prior to integration into SAP applications.
In many ETL processes, reference data is used to transcode source reference data values to target reference data values, such that reference data is harmonized in the target system when a process is complete. Transcoding is needed where one or more code values in the source system has a different meaning in the target system, or where the code values for the same meaning are different in the source and target system. Both issues are addressed by implementing transcoding tables harmonizing the reference data while data is exchanged between one or more source and target systems. Reference data is also used in every ETL process in order to validate data in order to ensure its “loadability” into the target against reference data tables from the target.