Declarative, large-scale machine learning (ML) on top of MapReduce (MR) or Spark aims at automatically generating execution plans for high-level ML programs with the goals of full flexibility and high performance. The primary focus of large-scale ML is data parallel computation, but many ML algorithms inherently exhibit opportunities for task parallelism as well. In one example, the ParFOR (parallel for) construct allows to combine data and task parallelism. For medium to large data and non-partitionable problems, the ParFOR optimizer picks local parallel plans, which run MR jobs for independent iterations concurrently on the cluster. This allows for latency hiding and higher cluster utilization. In addition, these MR jobs, which may run concurrently, often share common inputs and hence read the same data. This exposes optimization potential in case of small or highly utilized clusters because the available degree of parallelism is too small for all concurrently running jobs. This problem cannot be addressed with a pure compile-time approach because, especially for ML programs with convergence-based computations, the number of required iterations and conditional control flow are initially unknown.