The volume of information that is processed and stored by computer systems continues to expand at a remarkable pace with "desktop" personal computers and other small computer systems forming the most visible component of this growth. Most large corporations, however, still rely on mainframe systems for most of their basic data processing needs, even though the smaller systems have become faster and include computer storage media which can accommodate more data than in the past. This is because mainframe systems still hold a substantial advantage over small computer systems in terms of speed, volume of storage, and above all, capacity for large volume throughput. Accordingly, mainframe systems continue to meet data processing requirements that the smaller computer systems cannot match.
The proliferation of personal computers in the mass market has forced publishers of personal computer software to improve their products, making data on these small machines easier to access. But the benefits realized in the mass market in terms of improved personal computer software, have not been seen in the area of mainframe computer software despite the fact that mainframes, and their associated software systems, have been around for far longer. Hence, data in mainframe systems is often far more difficult to access than data on personal computers, making it harder to see the results of a computer process. One of the main reasons mainframe data is more difficult to access is due to the nature of the processing done on these differently sized hardware platforms. More specifically, the batch data typically processed by mainframe systems is far harder to access than the online data typically processed by personal computers as will be explained below.
Data processing can be divided into two classes: online and batch. Online processing is geared towards the immediate resolution of individual transactions, whereas batch processing handles large quantities of transactions as a group. Human interaction with computers is invariably through online processing, while large scale processing is most often handled in the batch mode.
Since batch data processing involves large quantities of data, the detection of errors in the data involves examining large amounts of the data. In online data processing, however, each item of information or data results, at least in part, from an interaction with a person and thus, errors in the data are more easily and likely to be detected. This personal interaction or "manual oversight" provides a degree of quality control. It should be noted, however, that large scale manual data entry may be regarded as a "batch" process in this context. Although the data is processed though human interaction, the processing is nonetheless mechanical in nature since data entry clerks generally do not read what they are typing.
In any case, when batch systems encounter undetected errors in the data, the process may or may not respond to the error. In the case where the process is affected by the error, it will either notify the user of a problem in a controlled fashion (if the possibility of that type of error was foreseen) or the process will be forced to a halt (when the error is of an unforeseen nature). The error in the data may also go undetected allowing the process to continue to completion, so that the incorrect data will not be immediately obvious.
There are many ways in which errors can be introduced into computer data. For example, errors can be introduced into computer data from "bugs" in the computer program, from external sources, from the operating system's environment, and from errors caused by the computer itself, just to name a few.
With regard to data errors which originate from bugs in computer programs, virtually all nontrivial computer programs contain some bugs. Careful design and exhaustive testing will typically identify most of the bugs, but some bugs will undoubtedly remain latent in any system, ready to affect the process when some new combination of circumstances arises in the data. Systems made up of suites of programs that work together, are prone to bugs in exactly the same way, since such software systems are in effect just large programs.
With regard to data errors which originate from external sources, computer systems which obtain information from outside sources are subject to errors from unexpected changes in the data from those external sources. Although program bugs are often blamed for such errors, many times these errors result from a failure of the personnel who are responsible for the system which produces the data to communicate with the personnel who are responsible for the system which receives the data.
As stated earlier, data errors can also be caused by the system environment. IBM's Multiple Virtual System (MVS) operating system may be responsible for more large scale batch data processing than any other system software. Unlike personal computer software which "crashes" frequently, MVS installations, which typically support hundreds or even thousands of simultaneous batch and online processes, "crash" very rarely. When a MVS operating system does crash, the crash is usually confined to individual processes or subsystems. However, MVS does have some serious limitations which relate to job control language (JCL), the programming language that links programs to the data that the programs access. The JCL is difficult to test since it has limited parameter substitution and inadequate features for process modularization. MVS also has an inflexible storage allocation scheme, which requires that storage requirements be determined in considerable detail in advance. In addition, MVS tends to require a great deal of manual (operator) intervention.
With regard to "computer errors," all such computer errors result either from hardware failures, or manual mistakes. When computer errors slip through undetected, they are generally manual in origin.
Present computer data error detection methods are generally geared towards ensuring that data moved from one place to another, arrives intact. This is generally accomplished by creating some kind of redundant representation of the data, and using the extra information to compare the original data to the copied version. However, such methods cannot detect errors in the original data. More specifically, errors created by software bugs are not detectable by present methods because such errors originate in the program itself and not in the failure of the hardware to correctly execute the program instructions.
It is, therefore, an object of the present invention to provide a data verification method for detecting errors which have been introduced throughout the entire computer system.