A growing complexity and interdependence of discrete computer systems requires reliance on data, the quality of which a particular user system has little or no control over. While the phrase “garbage in is garbage out” has been used for over half a century to describe data quality, until recently computer systems functioned independently, in isolated environments or over isolated networks. The garbage data typically impacted only the immediate system, which essentially was under control of the user.
The advent of network based computer business applications, electronic data interchange and an increasing reliance on digital databases requires control over data quality and integrity. Essentially it is the flow of data between the various computer systems that drives their very functionality. The interconnection of discrete computer systems through standardized networks, such as the Internet, has enabled data to be immediately accessible by other systems, where there is typically no control over the data quality and/or integrity. There is a need for knowledge about the quality and integrity of the data flowing between systems and within a system.
Current systems are concerned with the integrity and quality of data between two points, assuming that the data is error free at the first point and may become corrupted when it reaches the second point. Various schemes of error detection and correction are well known. More recently, digital signatures have been used to protect data integrity and identify its origination (source).
Most schemes for controlling data quality at its origination are generally directed towards program errors and bugs, by testing the program using test case sites. These schemes are very dependent upon the user interface, the programming languages, the database system, the operating system, and the system environment. These schemes are testing schemes, attempting to approximate a real user environment, but mostly occur in a laboratory environment. These schemes are discontinued once the software is released although they may start again when a new release is being prepared.
There is a need to be able to provide controlled inspection, monitoring and analysis of data to provide knowledge about its quality and integrity as the data is shipped between various computer systems and computer networks.