Various embodiments of this invention relate to database tools and, more particularly, to a data loading tool.
There is often a requirement to replicate a subset of data in a production database for development, testing or offline point-in-time reporting requirements. This subset of data will generally be a logical subset, for example, all data relating to a specific geographical region or a branch office and perhaps all data in the database for a specific monthly range. Typically, a database designer will attempt to eliminate duplication and store data properties in separate associated tables. Therefore, to extract a logical subset, related data in each associated table must be identified through successive join operations using foreign key relationships between the tables.
Given the challenging workload requirements on production systems, it is often impractical to run select queries with join operators against the production database directly to extract the datasets.
Furthermore, a target system will often be much smaller than the full-scale production system and would typically not have enough disk storage available to facilitate a full database restore of the production database, before using a complex structured query language (SQL) statement to extract the required subset dataset.
Patent publication US20060242205A1, “Self-contained partial database backups,” outlines an approach for creating a subset of a database, which might be useful in the above situation. The approach involves a database restore operation from the database subset whereby the subset consists of file groups and associated metadata.
U.S. Pat. No. 6,047,294, “Logical restore from a physical backup in a computer storage system,” outlines an approach for backing up and restoring data in computer system. A segment of data, such as a virtual disk partition, is backed up at a physical level from a primary storage device, and a logical element within the segment may be restored. The solution assumes the logical element is always a file.
The main drawback of these previous solutions is that the smallest logical subset a user can extract for the target system is a file. Considering the size of the production systems, the files and file groups of the backup structure are too large for the target test system and do not provide a specific small logical subset of data desired by the user (generally a subset of rows from a set of related tables in the database).