Database management systems (DBMS) provide centralized structured information services to client programs. The data within a DBMS is defined, stored, and retrieved according to certain basic information storage patterns known as data models. For example, a DBMS using a hierarchal data model would relate data entities within it through a tree-like structure with each entity having a single “parent” entity (or no parent, for the “root” entity) and any number of “child” entities. This hierarchal data model is used in the popular XML data format and for XML databases.
The most common DBMS data model for sophisticated computerized information is the relational data model, which is used by relational database management system (RDBMS). Relational data can be understood as a set of two-dimensional data tables with rows representing data entities and columns representing entity properties. Rows typically have a key column that identifies each row or entity uniquely. These keys are most often a positive integer incremented with each subsequently inserted row. Because table rows may be identified solely through their keys, a row in table A can “relate” to a row in table B by including a column with a key from table B.
By using these row identifiers for relations between tables, all data for a row, other than the row identifier itself, does not need to be duplicated and may be stored only in its canonical table. For example, a table of customers may contain street address information. When referring to this customer, such as in a list of sales orders, only the customer row identifier is required, as described before; the customer address does not need to be duplicated within the customer portion of the sales order. One of the major advantages of these normalized intra-table relations is that the customer address may be modified by only changing its information in the customer table as there are no duplicate inclusions of the customer address (i.e. in the sales orders) to maintain.
Row identifiers are typically automatically generated by the RDBMS as a simple incrementing positive number such as 1, 2, 3, and so on. As these identifier sequence generators are distinct from the tables themselves, if two databases share the same types of tables and columns, the row identifiers may be different between the two databases despite having identical content. These differences can arise from the order, number, and deletion history of the table rows. For example, if three customers were added in opposite order in two databases, likely zero, but no more than one row would have identical row identifiers across the databases.
While an RDBMS provides benefits of reduced storage size, improved data consistency, and, for some operations, higher query performance, its data model is extremely dependent upon correct and consistent use of the row identifier keys. Since data duplication is reduced or even eliminated, if the correct row is not related through the row identifier, there is no recourse to recover the correct row by examination of included duplicate data. For example, a sales order would contain only the customer row identifier and not additional customer columns such as their name or street address which would allow cross-referencing. Furthermore, mistaken row identifiers have a high likelihood of not merely referring to missing rows in another table, but may relate extant, but incorrect rows.
A DBMS typically runs on a single server in a single location, although more esoteric varieties are sometimes transparently spread among locations. As a RDBMS contains two principle parts, (1) the data tables describing and defining the data structure, and (2) the data rows which conform to the model so described, a server may likewise be seen as both the defining container and the content so contained. While interaction between completely distinct databases (i.e. both the tables and the data are distinct) is possible, there are a number of situations where multiple RDBMS servers are operational with the same table definitions, though the data they hold may be the same or different. In these same-definition scenarios, there may be a need to transport data between the database servers.
One such scenario is representing a multitude of software installations through a multitude of database servers. For example, stores in multiple locations may all have the same sales-tracking software installed, but each have their own database server to track their local sales data. In this case, one may want to move partial data, such as a customer and their related records such as order history, from one installation to the server of another installation. Another scenario involves duplicating the same data in several different database servers, perhaps located in different regions to allow continued availability and security. A third scenario is the use of data within one database server as the template for new data insertion in another server. A variation of this scenario is the use of such a template within the original database server.
From the aforementioned particulars about relational databases, there emerge four major difficulties in accomplishing the above transportation of tabularly identical data between and within an RDBMS:    (1) The transported data row identifier cannot be safely used in the new database due to collisions with existing identifiers and missing relations.    (2) The transported row may be the required relation or require relation with additional rows that should be transported as well. For example, transporting a product row may require supplier (relation to the product) or support history (relation from the product) rows.    (3) As the previous row identifier cannot be safely used, new row identifiers must be allocated as part of transportation.    (4) Given these new, different row identifiers, all necessary related data transported with the primary row must be re-related using the new row identifiers.
Because of these difficulties, transportation between relational databases is typically restricted to three scenarios, all limited in practice. These three limited scenarios are:    (1) A database may be completely duplicated or “cloned” into a second database. This is a destructive process in which all existing data in the second database must be completely erased. This is the typical technique by which a database is backed up and restored.    (2) A custom program may be written to collect the specific rows necessary and then insert them into the second database. This technique allows for partial data transport, but is entirely specific to the section of data, database, and intended application as it requires a unique program for each section of data transported and must be synchronously maintained to match any changes to the table definition or intention.    (3) In certain cases the data may be exported out to an intermediate format, such as CSV (Comma Separated Value) format, and then reimported. As the intermediate format typically lacks the full expression of relations and dependencies, this scenario is of limited use except for very simple data transportation such as for rows that are neither related to nor from which are related.
As these scenarios all demonstrate considerable restrictions of specificity, application, or scope, there presently exists no general solution for transportation of relational data, in particular for partial interrelated subsets of a RDBMS.