Databases abound in the age of the Internet. Many databases have as their origins data that originated elsewhere, in one or more other databases, for example. The original data may have existed in different formats, for different purposes, and with different standards and/or requirements for accuracy. Data integration is the process of taking multiple, disparate, heterogeneous, “dirty” data sources (i.e., containing less-accurate or even erroneous data, data of which the accuracy is not known, and/or data not optimally formatted) and combining them into a single, unified database known, to a desired degree, to be generally error-free and properly formatted. Sometimes, data integration is accomplished using multiple processing steps that iteratively clean and normalize the data. High-quality data integration often includes processing steps such as data collection, training machine-learning algorithms, cleaning up of “dirty” data, and verifying data. Cost, in terms of time and money, of the data integration process becomes a key consideration when the amount of data becomes large (e.g., millions or billions of input records). Through applied effort, ingenuity, and innovation, solutions to optimize the data integration process have been realized and are described in connection with embodiments of the present invention.