Enterprises frequently store large volumes of data across multiple tables in production databases, as well as in data warehouses. In order to effectively perform processes such as data testing, data analysis, and data reporting, it is sometimes necessary to extract information datasets from multiple tables at once.
For example, FIG. 1 shows a customer database that includes two tables, 101 and 102. The first table 101 lists the customer names and zip codes and the second table 102 lists IDs, customer names, and states. The data may be populated in the second table 102 in such a way that the value of the state field depends on the value of zip code field in the first table 101. In this case, if a user wishes to extract a subset of data for certain IDs, customer names, and corresponding states from table 102, then it will also be necessary to extract the corresponding zip code data from table 101 to ensure referential integrity. So, for example, if a user wanted to extract a data subset corresponding to the state value for the customer ID corresponding to ID=1, the data subset would include the values Robert and New York, as well as the value 11357.
FIG. 1 illustrates a simple example, but when there are many data dependencies between multiple tables, determining the appropriate subset of data to compile from all of the appropriate tables can be a complex and resource-intensive task. In many situations, the relationships between the tables result in one or more cycles when computing a data subset. As a result of these cycles, data subsetting processes frequently have to utilize a one-size-fits-all recursive solution to the problem of computing data subsets for large sets of data across multiple tables, whether or not the computation of a particular data subset involves cyclic relationships.