1. Field of the Invention
The present invention relates generally to data processing systems, and more particularly to a computer implemented method, apparatus, and computer usable program code for integrating data flow in heterogeneous data environments.
2. Description of the Related Art
In enterprise application systems, consolidation of various data integration tools is inevitable due to frequent merges and acquisitions as normal business practices. Typical data integration applications are those types of applications in which data from multiple sources on varied data systems and repositories, need to be processed, combined, or otherwise transformed into data that is then loaded into multiple targets, again residing on different data systems and repositories. The best application performance may be achieved by making sure that the total data integration processing be broken down into smaller components of processing modules and ensuring that the appropriate runtime engine or runtime system is selected for the appropriate task.
For example, a database engine is the most appropriate engine for filtering rows of a relational table in a structured query language (SQL) database. Using the database engine is more efficient than pulling the data out of the database and into a text file, where the text file requires further filtering in order to insert the final data into another table in the same database. Structured query language (SQL) engines are specially optimized for such queries and tasks. In some cases, legacy data in text files or spreadsheets is best processed with by a specialized extract, transform, load (ETL) engine. Legacy data source is any file, database, or software asset (such as a web service or business application) that supplies or produces data and that has already been deployed.
However, the average user is not conversant with all types of processing engines and is unable to effectively choose an engine or other processing component easily. As a result, an enterprise needs to employ experts for each variety and variation of engines.
The problem is that there are different runtime systems that work very differently, use different protocols, and are generally incompatible. Such heterogeneous systems also have different development paradigms, no common developer language or even a uniform integrated development environment (IDE). In addition to these differences, new runtime systems are continually added to the enterprise. Currently, such complex data integration applications rely on users developing separate modules of applications for each runtime system and writing specialized code for each pair of runtime systems to bridge the gap between them. As a result, current data integration applications do not allow data processing engines to operate in a truly integrated fashion.