Conventionally, enterprises employ a method involving extracting, transforming and loading data from disparate operational systems into data warehouses in order to integrate the data for the purpose of reporting and business analytics. However, this method is time consuming and leads to delays in operational activities as well as in making strategic decisions.
Hence, the demand for a method of real time integration of data from heterogeneous sources is increasing at a rapid pace. However, integrating data obtained from disparate/heterogeneous data sources in real-time is a computationally challenging task as it involves fast query evaluation. In order to achieve real time scalable data integration use of parallel query processing techniques is required.
Conventionally available methods and solutions for parallel query processing make use of knowledge of underlying database partitions for fast query evaluation. Hence, most of the available methods for real time integration of data obtained from heterogeneous data sources are limited by the number of partitions built on the underlying database. Such methods are not suitable when no partitioning exists in the underlying databases. Further, while using the currently available partitioning based solutions for integration of data obtained from disparate databases with overlapping partitions may significantly reduce the query processing efficiency.
Consequently, there is need for a system and method of real time scalable integration of data obtained from heterogeneous databases which does not require knowledge of underlying database partitions. There is need for a parallel query processing solution which is fast, efficient and makes no assumptions regarding partitions built on the underlying databases.