1. Field of the Invention
Embodiments of the invention described herein pertain to the field of computer systems. More particularly, but not by way of limitation, one or more embodiments of the invention enable a data quality enrichment system for records in a database.
2. Description of the Related Art
There are currently no known systems that provide turnkey integration to data quality enrichment entities to allow a company to filter particular data quality problems and transmit the filtered data to particular data quality entities. In addition, there are no known solutions that allow for the turn-key evaluation of the performance of multiple data quality enrichment entities for a given data set. Data quality enrichment provides the capability for a company to correct, cleanse, and/or expand data in a database that has incorrect or missing information. If data entry is not controlled or if data is imported from a spreadsheet or external source that does not apply company specific validation rules during entry, then missing or incorrect information frequently occurs. There are also no known systems that allow for the geographic display of data to show the location that the valid and invalid records occur.
In order to solve the problem of incorrect or missing information upon import, companies generally outsource the data correction/enrichment efforts to external entities that analyze and provide suggested modifications for the data. External entities may include corporations that specialize in data quality enrichment for example. The entities generally charge based on the number of records that are sent to them for cleansing. There are no known solutions that allow for junk-records to be prefiltered (since no data quality enrichment entity can correct junk-records) before they are sent to an entity save from paying for unsolvable work. Likewise there are no known solutions that allow for customers to define filters to find particular data quality issues that may for example be handled in a discounted fashion by a particular data quality enrichment entity. For example, there are no known solutions that allow for the filtering of German address problems so that a particular vendor in Germany that is offering a discount can be sent only those records.
In order to communicate suspect records between a company and a data quality enrichment entity, integration of the company database with the data quality enrichment entity is required. Integration by and of itself generally requires custom programming to meet the interface requirements of a data quality enrichment entity. This effort may be implemented in a myriad of ways all of which require great levels of initial programming and ongoing support maintenance to keep the connection operative as schemas and interfaces change over time. This integration effort is generally not repeated to include multiple data enrichment entities due to the time and cost required. Hence, once a company chooses and integrates a desired database with a data quality enrichment entity, there is generally no further opportunity afforded to evaluate other data quality enrichment entities for cost/quality/speed or any other metric of performance. As many data quality enrichment entities claim to be the best, it is difficult for a customer with limited resources to verify the quality claims of more than one data quality enrichment entity.
Integration with data quality enrichment entities is a monumental task for many reasons. One reason that integration is difficult is that each data quality enrichment entity may have a completely different methodology for exchanging data records. For example, one data quality enrichment entity may utilize a web service while another may utilize a proprietary text based format. Even when two data quality enrichment entities both utilize the same type of communication interface, they may not utilize the same schema or XML tags for example for transferring data. Furthermore, there may be competing standards or schemas that are utilized in many cases for the transmittal of data and as the standards or schemas are augmented, integration interfaces may break. The cost and time for integrating to multiple data enrichment entities is so large that few if any companies attempt to integrate with more than one data enrichment entity. Since there are virtually no companies that integrate with multiple data enrichment entities, there are currently no known systems that allow for the comparison between of work product produced by different data enrichment entities.
There are many reasons why data in a database is not perfect. Data enrichment would not be needed at all if all data was perfectly entered into a database. Data entry that is not performed under control of validation rules for example is one reason. Another reason is that humans are involved in the data entry process and humans generally make mistakes. For example, when entering data into a data input screen a data entry employee may leave out a particular portion of an address or transpose two letters or numbers in an address for example. The cost of correcting data increases as more business processes rely on and utilize the incorrect data. For example, an address for shipment that leaves off a critical office number may be returned, requiring much time to track and fix. Importing imperfect data into a database is hence a costly endeavor that requires great amounts of time to enrich data to meet the quality requirements for a given company.
Data quality enrichment entities are not created equal. An address that one data quality enrichment entity believes is unintelligible may be corrected and enhanced correctly by another. Since data quality enrichment entities may improve quality over time and others may appear in the marketplace as time goes by, a company may save great amounts of money by periodically evaluating current and new data quality enrichment entities. Evaluation depends on integration, so if there are no real costs with integration, then evaluation may flourish. There are no known solutions that provide turnkey integration to data quality enrichment entities, and thus there are no known evaluation solutions that allow for rapidly evaluating multiple data quality enrichment entities. Furthermore, any evaluation of data that has been enriched is generally evaluated in a haphazard manner that does not allow for periodic apples versus apples comparisons to determine if an entity has improved over time or if two data enrichment quality entities have improved performance for example.
For at least the limitations described above there is a need for a data quality enrichment integration and evaluation system.