Traditional Relational Database Management Systems (RDBMS) such as DB2® available from International Business Machines Corporation and Oracle® available from Oracle Corporation, etc., and emerging Big Data Systems (BDS) such as Hadoop® and Spark™ available from Apache™ are typically two siloed (i.e., isolated) data management systems, each having their own data format, query/programming language, and computational model, etc. Computations done within one system typically do not involve data in the other system, and vice versa. RDBMS typically handles structured data, while BDS typically handles semi- and un-structured data.
As business analytics and data mining become increasingly deeper and more sophisticated, it is often required that data from both systems be processed together. This is currently done through so called “data connectors,” which transfer data from one system to another as needed. However, as the data involved in the computations can be very large (particularly those in BDS), frequent data transfer between the two systems can result in high performance loss.
Accordingly, improved techniques for integrating RDBMS and BDS data processing would be desirable.