The development and testing of new applications requires the presence of data that can be processed by the new applications in trial runs. In order to be able to attribute a reliable information content to the results of the trial runs, it is essential that the data processed in the trial runs are equivalent in a technical respect (for example, as concerns the data format) to those data that are to be processed by the new applications subsequent to the development and test phase. For this reason, within the framework of the trial runs, those application data are frequently used that were generated by the currently productive (predecessor) versions of the applications to be developed or to be tested. These data, hereinafter referred to as productive application data or simply as productive data, are normally stored in databases in the form of data records.
The use of productive application data for development and test purposes is in practice not without problems. Thus, it has emerged that the data spaces accessible by the developers on the basis of their respective authorization in the productive environment are frequently not large enough to obtain reliable results. The results of trial runs also vary from developer to developer on the basis of their individual-specific data space authorizations. The data space authorization of individual persons can indeed be temporarily expanded for the trial runs; this measure is, however, expensive and, in the case of sensitive or confidential data in particular, is not possible without further checks or restrictions.
Another approach in regard to the use of sensitive or confidential productive application data within the framework of trial runs is to perform the trial runs on a compartmentalized and access-protected central test system. However, the technical cost associated with setting up such a central test system is high. In addition, such a procedure does not permit any delivery of data to (decentralized) development and test systems for error analysis.
The above-explained and further disadvantages have led to the insight that the use of productive data for development and test purposes is ruled out in many cases. An alternative to the use of productive data was therefore sought. On the one hand, said alternative should present a realistic copy of the productive data in regard to the data format, the data content, etc. On the other hand, the additional technical precautions, in particular as concerns the protection against unauthorized access (authorization mechanisms, fire walls, etc.) should be capable of being kept to a minimum as far as possible.
It has emerged that the above-cited requirements are fulfilled by test data that are generated by a partial anonymization (or masking) of productive data records. By anonymizing sensitive elements of the productive data, the potential damage that could be anticipated in the event of unauthorized accesses is reduced. This makes it possible to relax the safety mechanisms. In particular, the test data for trial runs and for error analysis can be loaded onto decentralized systems. Since on the other hand, however, the technical aspects (data format, etc.) of the productive application data do not have to be altered or have to be altered only slightly by a suitable anonymization mechanism, the anonymized test data form a realistic copy of the productive data.
A data record can be anonymized by erasing the data elements to be anonymized or by overwriting such data elements by a predefined standard text identical for all the data records, while the data elements not to be anonymized are retained unaltered. Such a procedure leads to anonymized data records without (substantial) changes arising in the data format. It has, however, became apparent that trial runs using such anonymized data records do not reveal all the weak points in the application to be developed or to be tested and frequently errors occur during initial use of the application in the productive environment.
The object underlying the invention is to disclose an efficient approach to providing anonymized test data. In this connection, the test data are intended, in particular, to be as faithful a copy as possible of the productive data in order to optimize the information content of the trial runs. At the same time, the probability of failure of the application to be newly developed in the productive environment is intended to be minimized and the maintenance expenditure associated therewith is intended to be reduced.