There are many applications that can move data from one system to another. With all of the buzzwords flying around such as “Big Data”, and the “Cloud”, data is being moved across systems for all industry demographics at a rapid pace. Unfortunately, a significant percentage of this data has low integrity; it is missing, incorrect, or stale. This poses a very interesting question. With all of the data movement between systems across computer networks such as the Internet, how can assurance be obtained that the data moved is actually correct?
Integrations are complex in nature. The technology has evolved to make integrations easier, but there still remain many problems. Many issues are technical and others arise from tradeoffs involved in the movement and storage large amounts of data. For instance, data may be modified or added by users to a target system, and often these changes include data entry errors, which can lead to further problems in not only in the target system but also the source system. Also, unknown processes can impact missing or incorrect data. Furthermore, multiple integrations may conflict with data at the target system. For example, one integration may conflict with another.
As an integration system or application runs successfully over time, the integration system eventually often is ignored as it is assumed to be working correctly. Typically, when integration systems are first implemented each is heavily monitored and fine-tuned. Usually integration systems are monitored through log files or using the integration system's application interface. This usually works up front, but not over time as the integrated target system is used. To monitor over the lift of a target system can be expensive. For instance, monitoring application logs effectively becomes costly as more and more integrations are added to the target. Also, often personnel resources are moved to newer projects leaving little time to maintain older integrations. Sometimes, issues regarding correct operation are not known until a major problem is reported, and significant errors in a target system is often even greater when multiple integration applications are being used. These errors can even become viral in that they corrupt other data in the target system. Also, to compound the issue, often reporting of errors is not all encompassing, for example, inconsistency across multiple records is missed where an error is found in one record. Also, over time single records that are not correct can accumulate and multiply. In many situations, the confidence in data of system can be completely lost.
The aforesaid issues are of much concern to the field of data integrity. Data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life cycle, and is a critical aspect to the design, implementation, and usage of any computerized system, which stores, processes, or retrieves data. Integration systems commonly have data integrity features built in, but how to measure the data integrity outside of the integrations is often missing. This is problem that needs to be solved efficiently and effectively. Also, most integration systems are stateless and control data within their integration cycle. Such systems can have measures to ensure the data being integrated is correct, but the integrity of the data can be compromised by outside resources or over the lifetime of the system. Described herein are systems and methods that provide improvements to data integrity. The methods and systems described herein are specifically focused on improving data integrity in systems that have multiple integrations over time. One example solution used by the methods and systems described herein to improve data integrity involves the use of stateful integrations oppose to the more common use of stateless integrations. This and other example solutions to enhance data integrity are described in detail herein as well.