Database information desensitization is used to desensitize sensitive database data, such as personal data in employee or customer records, before the data is released to sources which should not have access to such data. Database information desensitization may also be used to desensitize sensitive information contained in part of a database query which is being released to a party. There are a variety of reasons why database information may need to be released to a party who does not require access to some elements of the information. One typical reason is that some data in a database and/or queries to a database might need to be released to an outside party for the purpose of conducting database testing that closely or exactly replicates the type of queries or operations that a company may perform upon the database data.
Several techniques for database information desensitization are known, such as: data replacement, data swapping, data anonymization, data randomization, and data encryption. For example, one way that desensitization is currently performed is by replacing sensitive database information with trivial non-useable data such as null values. An example of this would be replacing every social security number with a series of nine identical numbers such as 444-44-4444, or replacing every salary listed in a database with a single salary number, such as $10. Similarly, randomly generated numbers can be used to arbitrarily replace each salary, social security number, or other sensitive piece of data in a database and/or in a query. In another example, data entries are replaced by their class intervals or swapped within a single field in a records set. These methods of desensitization are effective to varying degrees in desensitizing or hiding the sensitive information of a database, but do not leave a database with data fields which can be used to validate queries nor do they leave queries that are applicable to the desensitized data. That is to say, database information is desensitized but relevant characteristics are destroyed, thus rendering the desensitized database and/or query virtually or completely useless for any database testing purpose such as benchmarking.
In another instance, a mathematical function (such as an nth degree polynomial) may be used to convert or encrypt sensitive information into obfuscated information. While this method may preserve some characteristics of the desensitized information, such that it may be useable in some manner for database benchmarking, it is not very secure. For instance, by analyzing a small subset of the converted information, the function used to convert or encrypt the information can be discovered. The information converting or encrypting process can then be reversed, thus revealing sensitive information, such as employee or customer information, which was thought to be desensitized.
As can be seen, current methods for desensitizing database information suffer from shortcomings which endanger the security of sensitive information released to an outside party or else destroy characteristics of the information which may be needed by an outside party in order to perform database testing. Thus a technology which addresses these shortcomings would be advantageous.
The drawings referred to in this description should not be understood as being drawn to scale unless specifically noted.