Within an organization, which handles databases, there is often a development team, either internal or consultancy, that needs to write applications to process or edit the data of the databases or to extract data from the databases for some reports prepared by analysts using Business Intelligence (BI) tools.
During the development and testing phase of an application, the development team is not allowed to use the production database for several reasons. For example, the performance of the current production system may be affected by the development teams if complex queries are run on the system for development purposes and testing. The volume of data is very crucial too, because it can slow down the development time if a slow query is run frequently. The production database may contain sensible information that the developers should not be allowed to access. For all these reasons, the Database Administrator (DBA) has to generate a copy of the production database using an ETL (Extract-Transform-Load) tool and manually performs repetitive steps such as: creating schema of the source production database on the target database that will be given to consultants/developers and copying a small portion of the data of all the tables from the source to the target. For example, only the sales of the last year instead of the full database history may be extracted for testing purposes. This task is difficult and time consuming, because the DBA has to be sure that the data of the different tables is still consistent, which means joint tables would still return values. Anonymizing certain sensitive data like social security numbers, credit card numbers, etc. is also a must. All these steps take a lot of time and are performed manually.