The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Today, much information is digitized and stored in databases that are managed by database systems. Databases can be substantial in size, and it is not uncommon to find databases that can hold more than a few million gigabytes.
Under a variety of circumstances, it may be necessary or useful to move data between databases. There are various ways to move data between databases. For example, one can move all data from an existing database into another simply by making a copy of the existing database. Copying an entire database is reasonably fast, since standard operating system utilities can be used to make an exact, binary copy of all the files in the database.
However, making an exact copy of a database is not so useful for many database-to-database movement needs. For example, when building data warehouses, the source and the target databases are typically not identical. For this reason, database owners prefer to incorporate new information into their existing databases, letting that newly transferred information become a subset of the existing database, and not a separate database.
Moving subsets of data between databases is a slow and complicated process. One cannot simply copy a subset of files from a target database into a source database and expect all the data to be integrated into the source database automatically. The intrinsic complicated internal structure of databases makes it necessary to perform additional integration steps.
Pluggable Tablespaces
One way to quickly move data between databases is to use pluggable tablespaces. In general, a tablespace is a logical portion of a database used to allocate storage for table and index data. Each tablespace corresponds to one or more physical data files. Pluggable tablespaces allow the transport of a set of tablespaces from one database to another.
A “pluggable tablespace set” is a set of tablespaces from a source database that have been selected to be transported/plugged into a target database. In order to transport or plug a tablespace set from a source database to a target database, export and import operations are used.
To move data from one table to another using pluggable tablespaces, an export operation copies the tablespace set from a source database and creates a pluggable tablespace set. When the pluggable tablespace set is created in plug-in format, all the metadata information in the data dictionary of the source database, about all objects in the pluggable tablespace set, is exported in DDL format into an export file. This information includes data about tables, indexes, referential integrity constraints, and space allocation.
The import of pluggable tablespaces involves storing, as part of the target database, the files that correspond to the tablespaces in the pluggable tablespace set. In addition, the metadata for the pluggable tablespace set is reconstructed and inserted into the target database's data dictionary.
Using pluggable tablespaces avTSs that need to patch absolute disk pointers by using tablespace-relative disk pointers. In addition, using pluggable tablespaces integrates metadata by exporting tables in their entirety into a high-level, data description language (DDL) format that does not employ pointers or separate metadata at all.
For an object in the pluggable set, such as a table, part of the exported information includes a tablespace-relative pointer to the location of the object.
An example of how pluggable tablespaces might be implemented is described in U.S. Pat. No. 5,890,167, entitled “PLUGABBLE TABLESPACES FOR DATABASE SYSTEMS”.
Binary XML
Binary XML is one format in which XML data can be stored in a database. Binary XML is a compact binary representation of XML that was designed to reduce the size of XML documents. One of the ways binary XML compresses data is by representing strings with fixed values.
In one implementation of binary XML, a mapping is established between character strings and replacement values, where the character strings are tag names, and the replacement values are numbers. Such mappings are referred to herein as “translation information”.
For example, consider an XML document POI that contains the following content:
<Purchase Order>  <body>    Important Data  </body></Purchase Order>
PO1 includes the character strings “Purchase Order” and “body”. To store PO1 in binary XML format, the token “Purchase Order” may be mapped to 1, and the token “body” may be mapped to 2. Typically, the replacement values consume much less space than the corresponding tokens. For example, the token “Purchase Order”, which contains fourteen characters, may be assigned a binary replacement value that takes less space to store than a single text character.
Once translation information has been created, XML documents may be stored in binary XML based on the translation information. For example, PO1 may be stored as <1><2>Important Data</2></1>. In typical implementations of binary XML, even the symbols (e.g. “<”, “>”, and “/”) may be represented by binary replacement values.
Translating Between Binary XML and Text
When stored in binary XML, an XML document consumes much less space than is required by other formats of XML storage. However, the space savings is achieved at the cost of additional overhead required to convert textual XML to binary XML, and to convert binary XML to textual XML. For example, to be meaningful to an application that requests PO1, <1><2>Important Data</2></1> would have to be translated back into:
<Purchase Order>  <body>    Important Data  </body></Purchase Order>
In order to reconstruct the text of an XML document that has been stored in binary format, the translation information that was used to encode the XML document must be available. The translation information that is used to store XML data within a database are typically stored separate from the binary XML data itself. In fact, the translation information used to encode binary XML data is often located in a different tablespace than the tablespace in which binary XML data is stored.
Moving Binary XML Between Databases
Unfortunately, tablespaces that contain binary XML cannot be moved between databases using the pluggable tablespace techniques referred to above. Specifically, once plugged in to another database, the database server that manages the new database would not know how to derive the original XML text from the binary XML contained in the plugged-in tablespace. Consequently, binary XML has to be moved from one database to another by converting the XML data to a text format and putting the XML text into a dump file. The text in the dump file is then parsed by the target database, and inserted into the appropriate tables. This process of parsing and inserting is very memory and CPU intensive. The time taken by the entire process is linearly proportional to the number or rows being imported. Consequently, this process can be very slow. For a large dataset, the current export/import process becomes impractical. In addition, the XML data in the dump file occupies a large amount of additional disk space.