Dirty data is a worldwide problem. Dirty data can cause reliability problems, since the code written did not expect to find dirty data, and will certainly impact user satisfaction. Early detection of dirty data leads to catching bugs and problems before impacting more customers, and becoming harder to fix. This is especially important when data is migrated from an old application to a new application, since migration tools may generate states in the system that may not have happened at normal run time.
Consistency of data is oftentimes a requirement in complex, distributed applications. For example, if a user paid for a service, then services must have been delivered, or vice versa. Applications normally try to achieve consistency by using transactions and constraints. Most RDBMS (Relational Database Management System) software provides these facilities. For example, “asserts” have long been used in the C programming world to detect problems at run time. However, many times support from core database engines is not enough, due to a number of factors, including, but not limited to, spatial separation, expressibility, performance, temporal separation, software, and validation.
Spatial separation is where data is split over multiple databases. For example, a “User Paid” set of data may be maintained in a totally different database than the “Services Delivered” data. Distributed Transactions architecture, a conventional transaction processing mechanism for systems, solves this problem for databases connected to each other.
Further, when using a conventional constraint approach, some constraints cannot be expressed in the database constraint language, resulting in expressibility problems. Poor constraint performance at the database level can cause performance degradation, and may even cause a lock. However, the use of constraints, transactions, and distributed transaction solutions try to solve the same problem through avoidance.
Some data changes become “eventually” become consistent rather than consistent all the time. For example, the “Services Delivered” database may be in some queue and will eventually be delivered. Thus the “Services Delivered” database cannot be queried at the same time as the “User Paid” data, causing temporal separation. Additionally, some software components of the consistency check may not provided transactional support. While tools exist to implement the validation solution, using the tool to verify correctness of the tool itself, does not help.
What is needed is a technology that verifies if the tools are working correctly by detecting inconsistencies. Furthermore, a verification mechanism is needed to determine if the tools are working correctly without effecting performance of the live system.