A database may be associated with a production system (e.g., online transaction processing (OLTP), online analytical processing (OLAP), analytics, front-office, back-office). The database may be cloned (e.g., copied as a whole) to allow work, testing, analysis, and so on, to be performed using the cloned data, leaving the original data to continue to support production (e.g., order processing, transaction processing). The data to be cloned may be arranged as columns in tables that support the production system. The cloned data may be referred to as a “clone”.
Cloning may produce security issues with the clone. For example, the clone may not be as secure as the original, may not be run on equally secure hardware, may be exposed to people or processes that would not be allowed to access the original, and so on. Conventional approaches to address these security issues include manually removing sensitive data from a clone, manually running programs to remove sensitive data from a clone, encrypting sensitive data in a clone, and so on. However, these conventional approaches may consume unacceptable amounts of time and still not produce desired security leaving, for example, at least momentary insecurities. Furthermore, conventional approaches may make it difficult, if possible at all, to monitor the cloning process and/or the desensitizing process, which may in turn make it difficult, if possible at all, to perform a partial recovery from a failed cloning and/or desensitizing.
Cloning data involves selecting data to clone and then cloning it. Which data will be cloned may depend on why the clone is being produced. For example, a first clone may be produced to support testing a new feature while a second clone may be produced to support testing a bug fix. Similarly, a first clone may include data associated with a first set of applications while a second clone may include data associated with a second set of applications. In each example, a different set of data may be needed and the different sets may include different pieces of sensitive data. Conventionally it has been difficult, if possible at all, to have data that self-describes itself with respect to being desensitized. Thus, conventional cloning has involved intense, inflexible manual configuration that takes significant amounts of time and yet still produces insecure clones.