Embodiments of the present invention relate to policies, and more specifically to techniques for enforcement of policies that utilize information from information sources external to an organization.
The development of many applications, such as enterprise resource planning (ERP) applications, require data for testing purposes. This data for testing (often referred to as test data or seed data) may be used to validate an application's functionality and generally to determine if the application functions properly. Seed data may also be used for other purposes, such as for demonstrating applications to potential customers. Seed data may be created in many different ways. For instance, one way to create seed data is to simply copy existing real data used by one application. Applications, however, often utilize their own logical models (schemas or sets of schemas) for organizing data and data used in connection with one application may be organized differently than another. Transforming original data from one application to seed data for another application, therefore, may involve a costly process of transformation. In addition, original data may be proprietary or may contain confidential information. Thus, use of original data as seed data may require a costly and time-consuming process for transforming the data to address any concerns with the user of the original data.
Accordingly, seed data is typically created manually and/or using computers that perform simple algorithms. An employee, for example, may manually input fictional data. An automated program may generate fictional values. Data created in this manner, however, has several disadvantages. Seed data created manually and/or repetitively according to conventional methods, for instance, may be unrealistic. For instance, data used by applications often have statistical distributions that may not match data generated by conventional methods. Actual data may be distributed according to a Gaussian distribution whereas seed data created according to conventional methods may be distributed according to a uniform distribution. In addition, real data often contains mistakes, variations, correlations, and other characteristics that are difficult to accurately recreate using conventional methods.
Because of the differences between real data and conventionally generated seed data, the use of conventionally generated seed data is not ideal. Demonstrations of applications using conventionally generated seed data, for example, may appear unrealistic. In addition, the use of lower-quality conventionally generated seed data may not test the abilities of an application in the same way that actual data would. For instance, without anomalies in seed data, testers may not see how an application reacts to such anomalies.