There are often many departments, groups and applications in any company that would like to get access to the data being created in the Production environment. These include Development, Test, Analytics, Compliance, Marketing, among others. Production does not allow external groups to directly access the Production data fearing that they will effect performance and/or integrity of the data. Because of this, the only way to get access is to get a copy.
There are three main methods to getting a copy of the Production data and they have their downsides. The single largest obstacle to getting a copy is finding a time that it can be created. The action of copying the data will create a performance drop and would need to be scheduled and managed as to when it could occur. Assuming the right schedule could be found, a simple copy of the data could be created. If this is continually changing data, then the simple copy method cannot be used. The second approach would be to create a snapshot on the storage array. This is attractive because it is created quickly and can easily be destroyed when no longer needed. The downside to this is that access to the snapshot will share the same storage array resources with the Production data. Although this solves the problem of data integrity because it is a separate copy, it does not solve the problem of performance impact. The final method used is to get the copy from a backup. Companies will protect the production data by doing a daily backup of it. This creates an independent copy of the data. Restoring the data provides a completely independent copy that does not affect the integrity or the performance of the Production data. The downside with this approach is the amount of time it takes to restore the backup. It could take hours to days to weeks to get a copy of a complex and large data set.
A new solution in the market is Copy Data Virtualization. This captures one full copy and then captures incremental change data according to a schedule. Using storage virtualization techniques, it can provide independent copies in seconds to minutes, regardless of complexity and size. This approach meets the requirements or not effecting integrity or performance and also solves the problem of the time it takes to restore from a backup.
Once a copy of the data is available, it may need further processing to protect sensitive data contained within. For example, a database might contain credit card numbers. The data is protected while in the Production environment but if a copy of the database was provided to Development and Test environment, it will have lost many of the protections that exist in the Production environment. According to the requirements of who will be using it, what the need is and what environment it will exist in, a number of transformations to the data might need to take place. This could include subsetting, masking and data quality checks.
Now that there is a copy and it has been transformed, it now needs to be made accessible to the users outside of the Production environment. This process needs to be repeatable, scalable and manageable. The concept of a copy data token is used to create a self-describing entity that can be kept in a library of data sources and accessed in a controlled manner from within and outside of the Production environment.