Related Art
The disclosed embodiments relate to techniques for testing insecure computing environments. More specifically, the disclosed embodiments relate to techniques for testing insecure computing environments using random data sets generated from characterizations of real data sets.
Computing environments such as cloud computing systems and/or distributed data stores are often tested before use in production settings. For example, a development team may test the execution of a software system within a new execution environment before choosing to use the new execution environment as a development, staging, and/or production environment for the software system.
However, new computing environments may not include security controls that allow for testing of the computing environments using real data. For example, a software system may store and manipulate sensitive information such as financial data, medical records, and/or personal data. The performance of the software system may also be tested in a third-party execution environment such as a cloud computing system. However, the third-party execution environment may not provide adequate security measures for preventing unauthorized access to the data. Instead, developers of the software system may generate test data for use in testing the software system in the third-party execution environment.
Moreover, conventional techniques for generating “fake” test data for a software system may be associated with a number of drawbacks. First, randomly generated test data may bear no resemblance to real data used in the software system and thus lack characteristics, variations, and/or errors of the real data that can be used to simulate the real-world processing performed by the software system. Second, manual entry of individual data records as test data for the software system may be tedious, include biases of the users generating the test data, and lack the volume of the real data. Third, generation of test data from predefined characterizations of real data may produce test data with the volume and characteristics of the real data. On the other hand, the test data may be limited to the characterized data types unless additional manual characterization is performed to add new types of data to the test data.
Consequently, testing of software systems in insecure computing environments may be facilitated by mechanisms for streamlining the generation of random test data that conforms to the characteristics of real data used by the software systems.