Data migration, software integration, and data integration projects typically involve moving and/or transforming data from one system into another. This data is typically processed through some type of Extract, Transformation, and Load (ETL) cycle according to rules which are collected and then documented. These rules can be numerous, complex, and in some cases require that a single data record proceed through dozens of processes and rules before its ETL migration is complete.
Within the migration process, data records can be extracted, filtered out, combined, duplicated, proliferated, augmented, cleansed, enriched, truncated, replaced, substituted, and transformed, etc., at the record or field level. At the end of this migration process, data is loaded, pushed, or pulled into the target system. The volume of records passing through various processes can number into the millions for just a single data object. Further, there can be upwards of, for example, 100 individual objects (each with its own set up ETL rules) as part of a large data migration and data integration project.
With millions of records passing through so much processing and manipulation within the ETL stage, the migrated data (i.e., the end state) may be significantly different than the original, pre-processed source data (i.e., the original state). Technical team members, data owners, and business users may have difficulty understanding how and why a particular record, field or data set was extracted, filtered out, combined, duplicated, proliferated, augmented, cleansed, enriched, truncated, transformed, etc.
Without the knowledge of the particular rules to which records were subjected to in the course of the migratory process, validating and evaluating the appropriateness, completeness, and correctness of the ETL process is difficult.