1. Technical Field
The present invention relates generally to data migration, and more specifically to a system and method for migrating data using a heterogeneous mix of migration technologies.
2. Related Art
With the advent of EAS (Enterprise Application Solutions) applications such as SAP®, companies have had to deal with ever growing databases that support these applications. In the mid 90's, these databases were typically under 100 GB. However, over time, as more of a company's business has been committed to the application, large customers now have databases measured in terabytes containing ten's of thousands of tables.
With such large databases come certain challenges. For instance, when a company upgrades its hardware, the re-hosting (i.e., migration) of these large databases to a new machine architecture may be required. However, most of these systems are operating 24 hours each day, seven days per week (24×7). Therefore, in order to successfully rehost large-scale databases, a method of minimizing the outage during the migration is required. Prior to this invention, the overall migration speeds for migrating a database were generally around or below 20 GB/hr, which results in a substantial downtime and cost. The costs associated with such an extended outage are a major factor faced by companies when decided whether to do such a rehost.
There are several existing “methodologies” that have been historically used to re-host or migrate a database. The two primary techniques for migration are unload/load and export/import, which are described below with reference to FIGS. 1 and 2. Unfortunately, because most Very Large Databases (VLDs) are asymmetric in nature (i.e., have many different data set characteristics), prior solutions suffer from the fact that one migration technology cannot efficiently transfer the entire database. The result is an unacceptably slow migration rate (GB/hr). The only significant enhancements that have occurred to the current approaches, over the past decade, was to multi-program this single minded approach by breaking up the migration into pieces and running them concurrently.
An example of an unload/load system is shown in FIG. 1. The primary advantage of is that it creates a database independent intermediate set of flat files. This is useful when changing database vendors but not necessarily optimal with respect to performance when re-hosting while not changing the database. In general, all the tables and associated data are unloaded to a database independent flat file format. When this is completed, the user performs an FTP (file transfer protocol) of these flat files to the target system, and then the load process is initiated. This means that while the source system is unloading, the target is idle (and visa versa). As previously indicated, the heretofore enhancement to this process was to do multiple concurrent unloads to multiple flat files, then FTP them, and finally, run multiple loads concurrently.
The functionality for performing this multiprocessing approach is provided, for example, by generating control files and using the control files to execute multiple concurrent unloads and loads. However, it is still up to the user to perform such tasks as: execute the FTP at the conclusion of the unload; monitor the success of the various unload and load process; and recovery manually from any failures (and there is almost always some).
This unload/load approach generally performs at under 20 GB/hr. It is also faced with the problem that Very Large Databases (VLDs) tend to have some very asymmetric attributes and properties at the table level. For example, a 1 TB database may have a single table over 100 GB and 15,000 tables under a couple of megabytes. Unload/load is not an optimal solution for a single very large table as each unload/load thread is generally capable of only 1-3 GB/hr.
An example of Export/Import for migrations is shown in FIG. 2 utilizing named pipes. This approach takes advantage of UNIX® named pipes (a first in first out “FIFO” buffering system that appears to the application as if the pipe were just a file). Export/Import also uses the innate capabilities of the database to support network connections. Both the export and import run on the target system. The export requests data from the database over the network and “writes” it to the named pipe as if it were a flat file. The import reads from the named pipe as if it was a flat file and loads the database with the resulting data.
Export/Import has an advantage over the Unload/Load method in that “overlap” occurs in the reading and writing operations. The exports do not need to be completed prior to starting the import process because they are both running at the same time. This overlapping can achieve rates of 6-8 GB/hr per thread, and when possibly could achieve overall migration rates around 30 GB/hr.
Unfortunately, because this approach is table specific, it becomes complicated to set up, schedule, and manage the processing of large numbers of tables (e.g., greater than 20,000). Furthermore, it is not a supported methodology for SAP systems, which tend to have the largest databases that need re-hosting.
Accordingly, a need exists for a system that can provide an automated data migration environment and achieve a very high data transfer rate.