The development and testing of new applications requires the presence of data that can be processed by the new applications in trial runs. In order to be able to attribute a reliable information content to the results of the trial runs, it is essential that the data processed in the trial runs are equivalent in a technical respect (for example, as concerns the data format) to those data that are to be processed by the new applications subsequent to the development and test phase. For this reason, within the framework of the trial runs, those application data are frequently used that were generated by the currently productive (predecessor) versions of the applications to be developed or to be tested. These data, hereinafter referred to as productive application data or simply as productive data, are normally stored in databases in the form of data records.
The use of productive application data for development and test purposes is in practice not without problems. Thus, it has emerged that the data spaces accessible by the developers on the basis of their respective authorization in the productive environment are frequently not large enough to obtain reliable results. The results of trial runs also vary from developer to developer on the basis of their individual-specific data space authorizations. The data space authorization of individual persons can indeed be temporarily expanded for the trial runs; this measure is, however, expensive and, in the case of sensitive or confidential data in particular, is not possible without further checks or restrictions.
Another approach in regard to the use of sensitive or confidential productive application data within the framework of trial runs is to perform the trial runs on a compartmentalized and access-protected central test system. However, the technical cost associated with setting up such a central test system is high. In addition, such a procedure does not permit any delivery of data to (decentralized) development and test systems for error analysis.
The above-explained and further disadvantages have led to the insight that the use of productive data for development and test purposes is ruled out in many cases. An alternative to the use of productive data was therefore sought. On the one hand, said alternative should present a realistic copy of the productive data in regard to the data format, the data content, etc. On the other hand, the additional technical precautions, in particular as concerns the protection against unauthorized access (authorization mechanisms, fire walls, etc.) should be capable of being kept to a minimum as far as possible.
It has emerged that the above-cited requirements are fulfilled by test data that are generated by a partial anonymization (or masking) of productive data records. By anonymizing sensitive elements of the productive data, the potential damage that could be anticipated in the event of unauthorized accesses is reduced. This makes it possible to relax the safety mechanisms. In particular, the test data for trial runs and for error analysis can be loaded onto decentralized systems. On the other hand, however, since the technical aspects (data format, etc.) of the productive application data do not have to be altered or have to be altered only slightly by a suitable anonymization mechanism, the anonymized test data form a realistic copy of the productive data.
A data record can be anonymized by erasing the data elements to be anonymized or by overwriting such data elements by a predefined standard text identical for all the data records, while the data elements not to be anonymized are retained unaltered. Such a procedure leads to anonymized data records without (substantial) changes arising in the data format. It has, however, become apparent that trial runs using such anonymized data records do not reveal all the weak points in the application to be developed or to be tested and frequently errors still occur during initial use of the application in the productive environment.
The occurrence of errors in the productive environment, which are to be ascribed, as a rule, to defective programming of the application, is proof that the anonymized data used in the trial runs in the development and testing environment do not (yet) correspond to a sufficient degree to the productive data. However, programming errors occur more frequently in the development and testing environment than in the productive environment. This fact therefore requires the existence of effective error analysis mechanisms.
The object underlying the invention is to provide an efficient approach to the provision of anonymized test data. For the abovementioned reasons, the test data are intended to be as faithful a copy as possible of the productive data and, in addition, permit a reliable error analysis. In total, the information content of trial runs is to be improved using the anonymized test data and the failure probability of newly developed or further developed applications in the productive environment is to be optimized.