The present invention relates to database systems and, more particularly, to import and export of data to and from a database system, respectively.
Many database systems provide import and export operations to transfer data to or from another database system or another version of the same database system. Thus, import/export functions can be used, for example, to move, archive, and backup data. The export operation writes data from a database into a transportable operating system file, called a xe2x80x9cdump file.xe2x80x9d In exporting data, such as a table, database systems typically generate a series of high-level statements in a data description language (DDL) and store them in the dump file in a recognized file format. The DDL statements describe how to recreate the entire table and the metadata associated with that table. Metadata is information that describes how data is organized in a database, such as data dictionary information about what tables are in the database, what columns a table has, what indexes have been built for a table, and so forth.
When data exported into a dump file is imported, the target database system reconstitutes the tables, metadata, and other objects therein by executing the DDL statements through a high-level interface, such as a SQL (structured query language) layer. If the imported database tables do not currently exist on the target system, then the import function will execute DDL operations to set up the proper tables prior to importing the data. Some database systems provide a feature in which the proper DDL statements to set up the exported tables are included in the export file.
Import and export are useful in transferring data from a source database to a target database. To transfer data, a database administrator exports the desired data from the source database into a transportable dump file. The dump file would be transported to the target database system, for example, by physically moving a computer readable medium storing the dump file, e.g. a disk or magnetic tape. As another example, the dump file can be transmitted over a network to the target system, e.g. via the file-transfer protocol (FTP). Import and export can also support archival and restoration of data to recover from a system crash. In this case, data in a database system is periodically exported into a dump file, transferred onto a long-term storage medium, such as a magnetic tape, and stored in a safe place. To restore the backed up data, the tape or other storage medium is inserted into a tape drive accessible to the database system and the exported data stored thereon is imported.
A problem with conventional import/export utilities is that they typically operate at the granularity of the table level. In other words, tables are the smallest unit that can be imported/exported. Thus, when an import or export of a large table is being performed, it may take hours or even days. Users are normally blocked from performing any data manipulation (DML) commands on the table during this time, since such activities could create inconsistencies in the imported/exported data. However, this downtime can be unacceptably long for very large or mission-critical tables.
For example, some database systems implement on-line transaction processing (OLTP) applications, e.g. for processing airline reservations. In these applications, new transactions are recorded in a database table as they are made. Accordingly, these applications typically need to be up and running as much as possible, e.g. to allow customers to book reservations on a flight. In some cases, even an hour of downtime may not be acceptable. In addition, these databases typically contain entries for transactions covering years, and are accordingly very large. Consequently, such database use tables that are terabytes (trillions of bytes) in size. Thus, the huge size of these tables exacerbates the downtime of the database system that is importing or exporting the tables. It is difficult to improve the time to import/export a table, because of bottlenecks due to limited bandwidth of the disk controller and maximum load on the processor importing or exporting the table.
Furthermore, it is inefficient to require the import/export of an entire table if only a portion of the table needs to be imported/exported. In the example of the OLTP application, transactions tend to be stored in one very large table. If the table is periodically archived by the export operation, only a portion of the table at the time of the next backup actually contains data that was not previously backed up. However, since conventional export functions operate at the granularity of the table level, the entire table is exported, and then imported. Consequently, archiving an OLTP database is a lengthy process, requiring substantial periods of downtime, even if there is only a few megabytes of new data.
One approach around the table-level granularity import and export is to select the desired rows and write them to an operating system file. After creating this file, the user would manually edit the file to convert the contents therein into SQL INSERT statements for as many as the number of rows selected. Finally, to bring the data into a table, the user would apply the SQL INSERT statements in the file to the table. This approach suffers from performance difficulties because each INSERT statement is a separate SQL operation, rather than an array of rows inserted through a single import operation. Furthermore, the user must be granted very high levels of permission to perform this sequence of operations, compromising database system security. Security is important because this approach is cumbersome and error prone, making it desirable to protect the database system from user error.
There is a need for a system and method for importing and exporting data from a database at a level of granularity smaller than the table level, preferably at an enhanced performance and security relative to conventional workarounds.
There is a need to be able to export only new or changed data from an OLTP database table for archival and backup purposes.
There is a need to shorten the downtime due to importing or exporting, for example, caused by bottlenecks that result from the bandwidth of the disk controller and load on the processor importing or exporting the table.
These and other needs are met by the present invention in which a body of data, such as a table, is partitioned into subsets of data, which can be individually imported and exported. Thus, data can be imported and exported at a level of granularity smaller than the table level. These subsets can be based on when new data is added to the body of data, so that only recent data can be selectively exported. Furthermore, these subsets of data may be stored on different disk systems, spreading the workload around to different disk controller and processors.
Accordingly, one aspect of the invention is a computer-implemented method and computer-readable medium bearing instructions for exporting into a dump file at least some of a body of data, such as a relational database table or other object. The body of data is subdivided into subsets, for example, based on time-related, numerical, or alphabetical information contained in the body of data. Metadata descriptive of the body of data, such as how the body of data is subdivided into subsets, is stored in the dump file. One or more selected subsets are stored in the dump file by storing a marker descriptive of a selected subset and the data contained in the subset.
Another aspect of the invention is a computer-implemented method and computer-readable medium bearing instructions for importing data into a body of data by accessing a dump file containing one or more subset markers descriptive of a respective subset of the data, each subset marker is associated with data belonging to the respective subset. If one of the subset markers in the dump file is descriptive of a selected subset, then the data associated with the subset marker is imported into the body of data.
Yet another aspect of the invention is a computer-implemented method of repartitioning a body of data, subdivided into subsets, by exporting at least one selected subset into a dump file, reconfiguring the body of data according to new partitioning criteria, and importing the exported data into the body of data according to the new partitioning criteria.
Additional objects, advantages, and novel features of the present invention will be set forth in part in the description that follows, and in part, will become apparent upon examination or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.