The present disclosure relates to methods and systems for improving data integration, and more particularly relates to methods and systems for improving data integration and cleansing by estimating the effort required to integrate data into or cleanse data in a target database.
Data integration and data cleansing remain among the most human-work-intensive tasks in data management. Both require a clear understanding of the semantics of schema and data—a notoriously difficult task for machines. Despite much research and development of supporting tools and algorithms, state of the art integration projects involve significant human resource cost. In fact, it is reported that 10% of all IT cost goes into enterprise software for data integration and data quality, and it is well recognized that most of those costs are for human labor.
However, project estimation for data integration projects can be especially difficult, given the number of stakeholders involved across the organization as well as the unknowns of data complexity and quality. Any integration project can have many steps and tasks, including requirements analysis, selection of data sources, determining the appropriate target database, data transformation specifications, testing, deployment, and maintenance.
The embodiments described herein can provide improved methods and systems for estimating data integration and cleansing effort.