The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, it should not be assumed that a problem has been recognized by the prior art merely because the problem is discussed in this section.
The ability to store and retrieve large amounts of data is one of the most important functions of computers in today's society. To carry out these functions, database systems are typically used to retrieve and store data in databases. Database systems have performed these functions very successfully, creating for society the ability to retrieve data at speeds and quantities previously unimagined, and bestowing onto society an unprecedented level of access to information. The success of database systems has unleashed an insatiable demand for even faster and more efficient database systems that process even greater quantities of data.
One mechanism that provides efficient access to large amounts of data is a data warehouse. A data warehouse represents a transformation of raw data. The raw data used by the warehouse typically comes from a “source” database system, such as an Online Transaction Processing (“OLTP”) database. The OLTP database system is oriented towards the “real time” operation of a business, while the data warehouse is oriented toward answering longer range, management oriented, questions about the business. To stay current, the data warehouse house is periodically updated with information from the OLTP database system. These updates entail transfers of large quantities of data. Additionally, it may be desirable to move a database to a lower cost platform. Some other situation in which data may need to be transferred are database replication, instantiation, and migrations.
Typically, to achieve data transfer, databases offer utilities that allow data to be extracted from the source database objects into intermediate files that can be reprocessed to insert the data into the database objects at the target database. In contrast, a novel technique that is more efficient than many other conventional techniques for transferring data is referred to as transportable tablespaces. A tablespace is a logical collection of database objects that reside in physical storage containers (e.g., data files). In this specification the word “physical” refers to storage at a disk level. For example, a file is a physical entity since it represents a byte stream that actually resides on physical medium such as a disk or tape subsystem. The word “logical” refers a grouping of related objects that may be physical, logical, or a combination of both. For example, an application view of the database is “logical”, because the application interacts with the database as a collection of relational objects, such as related tables and indexes. The application view does not interact with the database as physical files. Database objects are objects managed by a database system. Transportable tablespaces is a technique that allows tablespaces to be copied and integrated into another database system, or in other words, attached (e.g., “plugged in”) to the other database system. With respect to a tablespace, database, and database server, the term “attach” refers to configuring a database and/or database server so that the database objects in the tablespace are incorporated within the database and the tablespace is used to store data for the database.
In contrast to repopulating the data at the row level, configuring a database to attach a tablespace involves modifying database metadata so that the tablespace and database objects associated with the tablespace are defined to be as part of the database and incorporating storage containers into the database for holding the database objects. The database metadata may be altered using a variety of techniques involving automated steps and/or manual steps performed by a Database Administrator (DBA). The DBA can run utilities available on the source database system that may be executed to export the metadata into a “metadata dump file”, and run utilities on the target database system to construct metadata from the metadata dump file. Alternatively, metadata can be included with the data being transported in the tablespace, and the target database would reconstruct the metadata from the metadata included in the tablespace. The DBA can also manually reconstruct the metadata on the target database system. Using the process of attaching tablespaces allows data to be copied using operating system utilities for copying files, which copy the physical bits and runs much faster than the process of extracting and loading data by executing queries and insert statements.
Unfortunately, prior to this specification there was no known method of transporting a tablespace from a “source database system” to a “target database system” having different formats. The differences in the physical layout of the on-disk data structures containing the data complicates the transport. In this specification the word “physical” refers to storage at a disk level. For example, a file is a physical entity since it represents a byte stream that actually resides on physical medium such as a disk or tape subsystem. As another example, a data structure is an abstract type that can contain members such as integers, floats, and character arrays. If a data structure member representation takes n bytes on system 1 but m bytes on system 2 (where n is not equal to m) then the physical layout of the data structures are not identical. Additionally, if a data structure member is not aligned at the same offset from the start of the data structure between system 1 and system 2 then the on disk structure format of the two systems are not identical. The physical layout of data structures could differ based on the computer system's architecture and the endian (byte ordering) representation, for example. Typically, transferring tablespaces could be performed easily if the source database system and target database system ran on the same platform, that is, the database systems ran the same version of the same software product and on same operating system product or version. The term format includes two parts, which are structure and presentation. The term structure describes sizes and shapes and in some implementations the structures are platform-independent. Presentation describes how a particular type of data is presented. The term “structure” should not be confused with the term “data structure”. Data stored in a particular “data structure” may be have a platform specific presentation. The term platform is used herein to refer to a computer system running a particular database product version and operating system product. A computer running Oracle's 10 g database server product for UNIX is a different platform than a computer running Oracle's 10 g database server product for Windows NT. An Oracle 10 g tablespace created on UNIX would have a different on-disk format than the same tablespace created on Windows NT. Transferring tablespaces could only be performed automatically if the source database system and target database system ran on the same platforms.
For moving data across-platforms, utilities such as Export/Import or Datapump are used. A conventional technique for implementing the Export/Import operations is the command generation technique. Under the command generation technique, an “Exporting” database system generates a file of insert commands. The insert commands conform to a database language, such as the structured query language (“SQL”). For each record being exported, an insert command specifies the creation of a record with the values needed to generate a copy of the record being exported. To import the data, an “importing” database system, which is capable of executing commands written in the database language, scans the file, executing each insert command.
However, Export/Import or Datapump operations are very slow since they need to process the data one row-at-a-time. Executing an insert command for each record being exported is typically a slow process, which may span days for larger databases as a result of processing the data one row-at-a-time. While data is being exported, access to the data is restricted, and users may need to tolerate a lengthy downtime. Consequently, the database user, who requires access to the data, may be significantly impacted while data is being exported. Similarly, data dump operations also rely on processing the data logically one row-at-a-time. In some cases, Export/Import and Datapump operations are too slow to meet user needs, and the associated traditional Export and/or dump files require a large amount of storage space. Thus, conventional techniques for exporting data may be significantly burdensome.
In view of the above, there is a need for a more efficient method of transferring tablespaces between databases.