Computerized devices and systems control almost every aspect of our life both as individuals and as a society. Many of the computerized systems gather or use significant amounts of data about products, processes, individuals, and other entities. A database is usually the most common tool to arrange and access large amounts of structured data in digital form. The data is typically organized for modeling relevant aspects of reality, in a manner that supports processes requiring this information. The term database may refer to the way users view the data collection, or to the logical and physical materialization of the data, in files, computer memory, or computerized storage.
A database is usually accessed through one or more applications issuing queries, rather than directly. For example, the balance of a bank account is usually updated or accessed by an application provided to an agent using a dedicated application, to the customer using a web service after proper identification, and not by directly reading or updating a specific field within a data structure or a table.
In some situations, a deadlock may be faced, wherein the development and particularly the testing and proofing of applications require the existence of database with sufficient data, otherwise certain functionalities cannot be tested, while generating such data and populating a database with the data requires the existence of such application. Even further, the database scheme or structure may be non-final and may evolve throughout the development of the application.
Although data for testing an application may be manually fabricated, such operation may require significant manual labor. Furthermore, fabricated data may be non-realistic, inconsistent, or meaningless, or at least may have distributions which are different than those of real life data based on real scenarios and population.
In other cases, data may exist but may be inaccessible to an application developer, due to laws, privacy protection regulations, or other limitations, such as organizational policy. For example, sensitive health or financial data, even if such exist, may be restricted and cannot be shared with application developers or QA staff members, whether such personnel belong to the organization maintaining the data or are external to the organization.
If data exists but is not accessible due to privacy limitations, masking sensitive details may not always suffice. For example, data may be exposed when transferred to another location, or some sensitive data may leak due to bugs or malicious actions. In other cases, if little data is available, masking some identifying details may not be enough to conceal the identity of subjects or other entities.
Testing an application that issues database queries differs from testing the queries stand-alone, since executing an application may depend on specific execution order or other relationships between different queries, or other limitations. However, the application source code may not always be available, or it may be inaccessible, for example it may be implemented in an unknown programming language or methodology.