1. Field of the Invention
The present invention relates generally to data processing systems, and more particularly to a computer implemented method, apparatus, and computer usable program code for dilating a sub-flow operator in a data flow.
2. Description of the Related Art
In enterprise application systems, consolidation of various data integration tools is inevitable due to frequent merges and acquisitions as normal business practices. Typical data integration applications are those types of applications in which data from multiple sources on varied data systems and repositories need to be processed, combined, or otherwise transformed into data that is then loaded into multiple targets, again residing on different data systems and repositories. The best application performance may be achieved by making sure that the total data integration processing is broken down into smaller components of processing modules and by ensuring that the appropriate runtime engine or runtime system is selected for the appropriate task.
For example, a database engine is the most appropriate engine for filtering rows of a relational table in a structured query language (SQL) database. Using the database engine is more efficient than pulling the data out of the database and into a text file, where the text file requires further filtering in order to insert the final data into another table in the same database. Structured query language (SQL) engines are specially optimized for such queries and tasks. In some cases, legacy data in text files or spreadsheets is best processed by a specialized engine, such as WebSphere Data Stage™, for example. A legacy data source is any file, database, or software asset (such as a web service or business application) that supplies or produces data and that has already been deployed.
However, the average developer may not be conversant with all types of processing engines and may be unable to effectively choose an engine or other processing component easily. As a result, an enterprise needs to employ experts for each variety and variation of engines.
The problem is that there are many runtime systems that work very differently, use various different protocols from each other, and are generally incompatible. Such heterogeneous systems also have different development paradigms, no common developer language, or even a uniform IDE. Additionally, new runtime systems keep getting added to the enterprise every so often. Currently such complex data integration applications rely on users developing separate systems of applications for each runtime system and writing specialized code for each pair of runtime systems to bridge the gap between them. As a result, current data integration applications do not allow data processing engines to operate in a truly integrated fashion.
For example, if a developer needs to have some processing in a SQL engine, followed by some processing in a conventional, specialized engine, the developer needs to hand code the way data is passed from one system to another. Some specialized engines already provide some limited support, especially for SQL engines.
In addition, current data integration applications do not make easy optimizations possible across the runtime engine boundaries, as frequently each engine is independent of the other and provided by different vendors. Furthermore, manual code is not rewritten easily when specialized new engines become available. Additionally, with current data integration applications, transaction processing also becomes difficult to account for when crossing engine domains.