1. Field of the Invention
The present invention generally relates to database management systems, and, more particularly, to mechanisms within computer-based database management systems for unloading, transferring and loading persistent data residing in one data source into a remote data source, without being limited by physical constraints of the file transfer system at either the source or the target.
2. Description of Related Art
The increasing popularity of electronic commerce has prompted many companies to turn to application servers to deploy and manage their applications effectively. Quite commonly, these application servers are configured to interface with a database management system (DBMS) for storage and retrieval of data. This often means that new applications must work with distributed data environments. As a result, application developers frequently find that they have little or no control over which DBMS product is to be used to support their applications or how the database is to be designed. In many cases, developers find out that data critical to their application is spread across multiple DBMSs developed by different software vendors.
Often, it is desirable to unload/extract/export data from one data source, transport it to the target site, and load/import the data into the target site data repository, without being limited by physical constraints of the file transfer system, at either the source or the target site. It is also desirable that the loading of target records can be accomplished concurrently with the unloading of source records. This can be accomplished if data are transported record by record, so that operations of sending one record and receiving another record are happening concurrently on a source and target site.
Presently, however, there is no such capability. Data are unloaded into a file, the whole file is transported, and then data is loaded into a target. Transporting the data within files is conventionally performed using a file transfer procedure (FTP). Unfortunately, companies using FTP for DBMS applications may encounter some stumbling blocks. Often, there is a concern that source site and target site file formats may not be compatible. Moreover, the quantity of data to be transferred may exceed the maximum target site operating system file size. Furthermore, if design requirements call for a transfer of record data from database tables whose attributes must span multiple DBMSs, this is not currently supported, because presently record data may not be joined together from multiple data sources, while being transferred. Moreover, it is desirable to use different database products supported by a variety of leading information technology vendors, because they offer many potential business benefits, such as increased portability and high degrees of code reuse. Unfortunately, current DBMS vendors do not support data transfer in record by record mode, or joining of data from multiple data sources, while the records of data are being transferred.
Thus, the developer is forced to turn to more complex (and potentially cumbersome) alternatives to gain access to needed data records. Often, the alternatives are more costly and time-consuming to implement, require a more sophisticated set of programming skills to implement DBMS technology, may consume additional machine resources to execute, may increase labor requirements for development and testing, and potentially inhibits portability of the data itself
One presently available solution to this problem, when an application developer needs to build an application that accesses and transfers critical data present in multiple data sources, involves manually simulating transparent access. In that case, a programmer takes on the burden of writing the software to individually connect to each of the necessary data sources, read in any necessary data, correlate (or join) the results read in from multiple data sources, perform any necessary data translations, etc. This is a substantial amount of work and is well beyond the skill level of many programmers. Furthermore, it incurs a great deal of cost. In addition, it requires considerable knowledge of the application programming interfaces (APIs) of each data source involved, and affords less opportunity for query optimization, which may inhibit performance.
Another presently available solution to the problem calls for a physical consolidation of the data, where the data from different data sources have to be copied into a single data source, which a programmer will then transfer. However, this raises issues involving data latency and added cost. Due to the data latency, copies of data will be slightly to significantly “older” than data contained in the original data sources. Working with out-of-date (and potentially inaccurate) data can be unacceptable to many applications. Increased costs include software costs, since additional software must be purchased, installed, configured, and maintained to copy data from one source to another on a scheduled or periodic basis, as well as the labor costs involved with it. The software must support data migration effort or implement a data replication process that supports very low data latency.
Therefore, there is a need to provide a method and a system which can transfer persistent data, often residing in multiple data sources and possibly stored in different formats, to a target site depository, record by record. This would simplify the design, development, and maintenance of applications and provide more reliable applications with a function that would otherwise be inaccessible.