Enterprise systems employ a variety of approaches to process data. For example, an enterprise may employ batch extract, transform, and load (ETL) processes to retrieve data from external sources (e.g., databases), process as needed by a given enterprise, and store the data in a destination source (e.g., data warehouses, operational data stores, etc.). A developer may design ETL data flows to process the enterprise data sets. Typically, once developed, an ETL data flow may be executed on similar input data sets.
One common issue in running batch processes on data sets includes using system resources effectively to perform the processes in a relatively fast manner. This issue presents a challenge to developers who, in trying to increase efficiency, may tune different parameters of the data flow design, such as adjusting memory block size, increasing a degree of parallelism, etc. Tuning these parameters often involves a significant amount of tedious guesswork and system monitoring to identify and resolve performance issues.