Test data for testing an application may be obtained by creating a copy or clone of production data or generating synthetic data. The testing of the application using the production data is typically considered as reliable, as the production data corresponds to actual operational data. Such testing using the production data is known as data driven testing. Further, it is easier to create a copy of the production data than generating synthetic data, which is new data all together. However, copying the entire production data and keeping it in different test environments may lead to increased space requirements.
Generally, functional testing of an application is performed for certain selective test cases, and thus such a testing requires the production data corresponding to only those test cases. Therefore, using the entire production data, where only a portion of the production data is required may consume more time and resources in testing the application. Further, keeping the entire production data in the test environment may also lead to increased space requirement. Accordingly, in such cases, to reduce computational resources and time, functional testing of the application is performed using a portion of the production data. The process of extraction of the portion or subset of the production data from the production database is known as database sampling.