Distributed query execution frameworks, such as a calculation engine executing calculation scenarios, are increasingly being adopted. Calculation scenarios offer the possibility to integrate custom operators such as C++ operators that are compiled with database binaries and also script based (LLVM) operators which, in turn, can be easily created and integrated by customers without any changes to the database binaries. From the calculation engine perspective, these operators can be handled as black boxes, that is, only the interface is known meaning the number and data types of the input columns and the number and data type of the output columns.
With some conventional arrangements, native calculation engine operators are generally set-based; however, some algorithms require the possibility to loop over several rows which is possible with script based operators. Row-wise looping on the other hand can be resource expensive for big data sizes and thus often can harm overall query performance. One approach to overcome such performance impacts implement highly parallelized algorithms which can be applied on separated chunks of data.
Distributed query execution frameworks employing calculation engines only offer static (i.e., pre-defined) query splits. This arrangement requires that the creator of the calculation scenario knows in advance how many parallel threads should be used during execution. At the time of execution, this static split criterion (as defined in a calculation scenario) is applied, independently from the system load and the available resources.
Static query splits become especially problematic when long running calculation scenarios are used for batch-based processing in which the response time and the duration of the query is irrelevant. For long running batch processes, it is important to not decrease the system performance for other concurrent queries which are triggered by end users. If such a long running query uses a static split, the query processing can often occupy a large share of the CPU performance in the system. During this time, other queries may starve due to lack of resources or at least suffer from bad response times from the end user perspective.