1. Technical Field
Embodiments of the invention relate to enterprise data integration, and in particular to providing data quality in data migration (e.g., ETL) processes.
2. Discussion of the Related Art
When initially loading data into a target system via an extract, transform and load (ETL) process, such as loading data into a data warehouse or loading data for some other data processing project, there can be multiple different data sources from which data is extracted. The data from different sources is typically extracted into a migration database, where operations such as data harmonization and data cleansing can be performed. The data is then loaded from the migration database into the target system.
Data quality issues often arise during ETL processing of data from a data load perspective into a target system for a number of different reasons (e.g., a mandatory field is empty, data values are not within a permissible range, lookup values are incorrect, certain constraints may be violated, etc.). When a particular field in a record does not comply with target system requirements, this represents a gap in the data quality and the record is rejected during load. Today, these data quality checks are done manually or by manually implemented rules (assuming a data quality check is done at all). Even if a data quality check is performed, this is not done systematically due to the manual approach and since the target system configuration changes during implementation, these changes are not reflected right away. As a result data migration projects often exceed initial processing time and budget estimates, because these data quality issues are only detected during load tests which are usually conducted shortly before the go-live date.